[Perl] 5.37.4 版中的一項很不錯的更新:編譯時期的語法錯誤訊息

作者:   發佈於:   #perl

Yves 在 perl-5.37.4 (開發版)中提供了一項很不錯的更新:

以下引自: https://metacpan.org/release/ETHER/perl-5.37.4/changes

Syntax errors will no longer produce "phantom error messages".

Generally perl will continue parsing the source code even after encountering a compile error. In many cases this is helpful, for instance with misspelled variable names it is helpful to show as many examples of the error as possible. But in the case of syntax errors continuing often produces bizarre error messages, and may even cause segmentation faults during the compile process. In this release the compiler will halt at the first syntax error encountered. This means that any code expecting to see the specific error messages we used to produce will be broken. The error that is emitted will be one of the diagnostics that used to be produced, but in some cases some messages that used to be produced will no longer be displayed.

也就是以後在編譯時期,若發現語法錯誤,就會立刻停止。這代表:語法錯誤訊息會更加簡化。

相關的公告跟 PR 在:

寫了個就簡單的範例,用 macOS 內附的 perl5.18 、perl5.30 及用 perlbrew 裝的 perl5.37.4 來比較了一下,可以看到在語法錯誤「連續出現」時, perl5.37.4 只會報第一個錯誤,然後就停止了。範例輸出如下。

# bat --style numbers err.pl
   1 use strict;
   2 use warnings;
   3
   4 sub foobar {
   5     my $bar = 41
   6     my $bas = 42
   7     my $bat = 43;
   8
   9     my $bau = 44
  10     my $bav = 45;
  11
  12     say $bar;
  13 }

# /usr/bin/perl5.18 -c err.pl
syntax error at err.pl line 6, near "my "
Global symbol "$bas" requires explicit package name at err.pl line 6.
syntax error at err.pl line 10, near "my "
Global symbol "$bav" requires explicit package name at err.pl line 10.
Global symbol "$bar" requires explicit package name at err.pl line 12.
err.pl had compilation errors.

# /usr/bin/perl5.30 -c err.pl
syntax error at err.pl line 6, near "my "
Global symbol "$bas" requires explicit package name (did you forget to declare "my $bas"?) at err.pl line 6.
Can't redeclare "my" in "my" at err.pl line 9, near "my"
syntax error at err.pl line 10, near "my "
Global symbol "$bav" requires explicit package name (did you forget to declare "my $bav"?) at err.pl line 10.
Global symbol "$bar" requires explicit package name (did you forget to declare "my $bar"?) at err.pl line 12.
err.pl had compilation errors.

# perl5.37.4 -c err.pl
syntax error at err.pl line 6, near "my "
err.pl had compilation errors.

這個行為其實很合理,畢竟只需一字之差,就可讓一段程式碼變成非程式碼。(註1)

目前為止 perl 會在看到語法錯誤時稍微多編譯一段,並試著列舉出整段程式碼幾個有可能是語法錯誤的地方。雖然這行爲有時候是有幫助的,但實際上太多錯誤訊息反而會誤導讀者,搞得讓人不太知道那一則錯誤訊息纔是癥結。

各版本的 perl 提供的錯誤訊息都將第一個錯誤報在第 6 行,但實際上或許該算在第 5 行比較合理,至少以這個簡單例子來說該修正的地方就是在第 5 行行末。但畢竟,編譯器是無法自動決定「正確的」修正是該在哪裡的,程式設計師才能。

如果編譯器能自動決定「正確的」修正,然後自動修正,那就表示其實這段程式碼也就是可編譯而內無語法錯誤的程式碼了。


註1:這是個在口語用詞上一個很有意思的問題。現在所有程式語言的語法都是有完善規則的(有沒有規則書則是另外一回事)。如果有一個文字檔,裡面的內容可以讓 cc 編譯成功,一般我們就說那個檔案內容是 C 語言程式碼。但如果我們將那檔案內容改一個字符,比方說把中間某個函式尾巴的 } 拿掉,使其無法被 cc 編譯成功,這時候或許我們會說那個檔案內容是「有語法錯誤的 C 語言程式碼」。可是,更加正確的說法應爲:那個檔案內容「不是 C 語言程式碼」。畢竟,如果把檔案內容拿去做個手繪語法樹出來,我們會發現最後有一條該出現的枝幹沒有出現。這就好比說,如果有人試作七言絕句一首,本應有二十八字,但創作出來的只有二十七字,我們也不會說那是「有語法錯誤的七言絕句」。當然,我們會看得出來這些內容「差一點就正確了」。或許這種容許錯誤出現的人腦內建的文字解析器,再過幾年就會被機器學習去而內嵌在某個類神經網路裡面了。