解 Perl Weekly Challenge 096 -- 單字次序倒轉與兩字串編輯距離

作者:   發佈於:   #raku #janet #rust

Perl Weekly Challenge 096 這次兩題字串處理題目。這次除了用 Raku 來解,順便也試試用 janetrust

字串編輯距離演算法是相當實用的。在 git 中有使用:git/levenshtein,用於在使用者打錯指令時提供建議:

# git lag
git: 'lag' is not a git command. See 'git --help'.

The most similar commands are
	log
	tag

perlbrew 中也有利用編輯距離演算法來完成同樣目的: perlbrew/editdist。並且,Raku 語編譯器 rakudo 中也有: rakudo/levenshtein,用於在編譯過程中發現未定義的函式名稱時提供其他已知函式名稱作為建議。如下所示:

# raku -e 'pay "hello"'
===SORRY!=== Error while compiling -e
Undeclared routine:
    pay used at line 1. Did you mean 'say'?

稍有不同的是,git 中所利用的是 Damerau–Levenshtein distance 演算法,而 perlbrew 與 rakudo 中則是 Levenshtein distance。前者將更換兩字符位置視為一步,後者則無視此操作。

TASK #1 › Reverse Words

Submitted by: Mohammad S Anwar

You are given a string $S.

Write a script to reverse the order of words in the given string. The string may contain leading/trailing spaces. The string may have more than one space between words in the string. Print the result without leading/trailing spaces and there should be only one space between words.

Example 1:

Input: $S = "The Weekly Challenge"
Output: "Challenge Weekly The"

Example 2:

Input: $S = "    Perl and   Raku are  part of the same family  "
Output: "family same the of part are Raku and Perl"

解 #1 > Revrse Words

題意是要將輸入字串 $S 以單字為單位倒排,空白字符則可以完全忽略不計。

這題只需要將字串先以空白切開,再反序組合回來就可以。

Raku:

sub reverse-words (Str $S --> Str) {
    $S.words.reverse.join(" ");
}

Janet:

(defn reverse-words
  "Reverse the string word by word"
  [S]
  (string/join (reverse (filter (fn [x] (not (= x "")))
                        (string/split " " S)))
               " "))

Rust:

fn reverse_words(x: String) -> String {
    return x.split_whitespace().rev()
            .collect::<Vec<&str>>()
            .join(" ");
}

原則上大同小異。janet 語中的 string/split 函式在碰到連續兩個空白時會切出一個空字串。所以得多加一步 filter ,把空字串给移除掉。

TASK #2 › Edit Distance

Submitted by: Mohammad S Anwar

You are given two strings $S1 and $S2.

Write a script to find out the minimum operations required to convert $S1 into $S2. The operations can be insert, remove or replace a character. Please check out Wikipedia page for more information.

Example 1:

Input: $S1 = "kitten"; $S2 = "sitting"
Output: 3

Operation 1: replace 'k' with 's'
Operation 2: replace 'e' with 'i'
Operation 3: insert 'g' at the end

Example 2:

Input: $S1 = "sunday"; $S2 = "monday"
Output: 2

Operation 1: replace 's' with 'm'
Operation 2: replace 'u' with 'o'

解 #2 > Edit Distance

以下所提供的是 Levenshtein distance 的遞迴解,基本上是依照 Wikipedia 上的定義 Levenshtein-distance#Definition

讀寫起來比較直觀。不過前述幾個專案中的實作都是採用動態規劃 (Dynamic Programming) 法,在參數(S1 或 S2)很長時,能大幅減少計算量。

Raku:

sub edit-distance (Str $S1, Str $S2 --> Int) {
    my sub lev ($a, $b) {
        return $a.chars if $b.chars == 0;
        return $b.chars if $a.chars == 0;
        return lev( $a.substr(1), $b.substr(1) ) if $a.substr(0,1) eq $b.substr(0,1);
        return 1 + (
            lev($a, $b.substr(1)),
            lev($a.substr(1), $b),
            lev($a.substr(1), $b.substr(1)),
        ).min;
    };

    return lev($S1, $S2);
}

Janet:

(defn lev
  "Compute the Levenshtein distance between string a and b"
  [a b]
    (cond (= 0 (length a)) (length b)
          (= 0 (length b)) (length a)
          (let [
                a_head (string/slice a 0 1)
                a_tail (string/slice a 1)
                b_head (string/slice b 0 1)
                b_tail (string/slice b 1)
                levtail (lev a_tail b_tail)
                ]
            (if (= a_head b_head) levtail
                (+ 1 (min
                      (lev a b_tail)
                      (lev a_tail b)
                      levtail))))))

Rust:

fn edit_distance(a: &str, b: &str) -> u32 {
    let chars_a = a.chars().collect();
    let chars_b = b.chars().collect();
    return lev( &chars_a, 0, &chars_b, 0);
}

fn lev (a: &Vec<char>, offset_a: usize, b: &Vec<char>, offset_b: usize) -> u32 {
    if offset_a == a.len() {
        return (b.len() - offset_b) as u32;
    }
    if offset_b == b.len() {
        return (a.len() - offset_a) as u32;
    }

    let n3 = lev( a, offset_a+1, b, offset_b+1 );
    if a[offset_a] == b[offset_b] {
        return n3;
    } else {
        let n1 = lev( a, offset_a+1, b, offset_b );
        let n2 = lev( a, offset_a,   b, offset_b+1 );
        return [n1, n2, n3].iter().min().unwrap() + 1_u32;
    }
}