Solving Perl Weekly Challenge 096 -- Reverse Words and Edit distance.

作者:   發佈於:   #raku #janet #rust

This week, Perl Weekly Challenge 096 gives us two tasks about string manipulation. Besides Raku, let's try janet and rust.

The algorithm for computing edit distance is of pratcial uses, such as in git: git/levenshtein, for recomending commands when user runs an unknown command:

# git lag
git: 'lag' is not a git command. See 'git --help'.

The most similar commands are
	log
	tag

Same as in perlbrew: perlbrew/editdist. Meanwhile in rakudo, the compiler of Raku language: rakudo/levenshtein, for recommending subroutines when seeing unknown subroutine name in the code. Like this:

# raku -e 'pay "hello"'
===SORRY!=== Error while compiling -e
Undeclared routine:
    pay used at line 1. Did you mean 'say'?

There is a difference though. In git, the algorithm is 'Damerau-Levenshtein distance'. While in perlbrew and raku, the algorithm is ' levenshtein distance'. The former sees character-swaping as one operation, the latter does not.

TASK #1 › Reverse Words

Submitted by: Mohammad S Anwar

You are given a string $S.

Write a script to reverse the order of words in the given string. The string may contain leading/trailing spaces. The string may have more than one space between words in the string. Print the result without leading/trailing spaces and there should be only one space between words.

Example 1:

Input: $S = "The Weekly Challenge"
Output: "Challenge Weekly The"

Example 2:

Input: $S = "    Perl and   Raku are  part of the same family  "
Output: "family same the of part are Raku and Perl"

Solution #1 > Revrse Words

For this task, we first split the string by whitespacse to get a collection words, and re-join them in reverse order.

Raku:

sub reverse-words (Str $S --> Str) {
    $S.words.reverse.join(" ");
}

Janet:

(defn reverse-words
  "Reverse the string word by word"
  [S]
  (string/join (reverse (filter (fn [x] (not (= x "")))
                        (string/split " " S)))
               " "))

Rust:

fn reverse_words(x: String) -> String {
    return x.split_whitespace().rev()
            .collect::<Vec<&str>>()
            .join(" ");
}

Turns out it looks more or less the same in these 3 languages. The string/split in janet producess one empty string when encountering two consecutive whitespaces, therefore an extra filter is required to remove those extra empty strings.

TASK #2 › Edit Distance

Submitted by: Mohammad S Anwar

You are given two strings $S1 and $S2.

Write a script to find out the minimum operations required to convert $S1 into $S2. The operations can be insert, remove or replace a character. Please check out Wikipedia page for more information.

Example 1:

Input: $S1 = "kitten"; $S2 = "sitting"
Output: 3

Operation 1: replace 'k' with 's'
Operation 2: replace 'e' with 'i'
Operation 3: insert 'g' at the end

Example 2:

Input: $S1 = "sunday"; $S2 = "monday"
Output: 2

Operation 1: replace 's' with 'm'
Operation 2: replace 'u' with 'o'

Solution #2 > Edit Distance

In the following, a navie recursive implementation of Levenshtein distance is provided, basically follow the definition from Wikipedia: Levenshtein-distance#Definition.

It is pretty straight forward this way. However, the implementations from the abovementioned projects are all done with Dynamic Programming. When the inputs (S1 or S2) are lengthy, the about of computation should be greatly reduced.

Raku:

sub edit-distance (Str $S1, Str $S2 --> Int) {
    my sub lev ($a, $b) {
        return $a.chars if $b.chars == 0;
        return $b.chars if $a.chars == 0;
        return lev( $a.substr(1), $b.substr(1) ) if $a.substr(0,1) eq $b.substr(0,1);
        return 1 + (
            lev($a, $b.substr(1)),
            lev($a.substr(1), $b),
            lev($a.substr(1), $b.substr(1)),
        ).min;
    };

    return lev($S1, $S2);
}

Janet:

(defn lev
  "Compute the Levenshtein distance between string a and b"
  [a b]
    (cond (= 0 (length a)) (length b)
          (= 0 (length b)) (length a)
          (let [
                a_head (string/slice a 0 1)
                a_tail (string/slice a 1)
                b_head (string/slice b 0 1)
                b_tail (string/slice b 1)
                levtail (lev a_tail b_tail)
                ]
            (if (= a_head b_head) levtail
                (+ 1 (min
                      (lev a b_tail)
                      (lev a_tail b)
                      levtail))))))

Rust:

fn edit_distance(a: &str, b: &str) -> u32 {
    let chars_a = a.chars().collect();
    let chars_b = b.chars().collect();
    return lev( &chars_a, 0, &chars_b, 0);
}

fn lev (a: &Vec<char>, offset_a: usize, b: &Vec<char>, offset_b: usize) -> u32 {
    if offset_a == a.len() {
        return (b.len() - offset_b) as u32;
    }
    if offset_b == b.len() {
        return (a.len() - offset_a) as u32;
    }

    let n3 = lev( a, offset_a+1, b, offset_b+1 );
    if a[offset_a] == b[offset_b] {
        return n3;
    } else {
        let n1 = lev( a, offset_a+1, b, offset_b );
        let n2 = lev( a, offset_a,   b, offset_b+1 );
        return [n1, n2, n3].iter().min().unwrap() + 1_u32;
    }
}

本文為《解 Perl Weekly Challenge 096 -- 單字次序倒轉與兩字串編輯距離》之英文版。