In extlink_preprocessor_text_parameterized rule in /lib/wt2html/pegTokenizer.pegjs, we have
r:( $[^'<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+ / !inline_breaks s:( directive / no_punctuation_char / [&|{\-] ) { return s; } / $([.:,] !(space / eolf)) / $(['] ![']) // single quotes are ok, double quotes are bad )+ { return tu.flattenString(r); }
line 2 $[^'..."'...] has single quote twice, and one should be removed.
line 4 $([.:,] is not reachable because the 3 characters aren't excluded by the first test.
!inline_breaks can be thought as &[=|!{}:;\r\n[\]\-] &{return magic_fn()}. Since [;:] aren't excluded by the first test, we can think of it as &[=|!{}\r\n[\]\-] &{return magic_fn()}
So the code can be simplified to
r:( $[^<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+ / &[=|!{}\r\n[\]\-] &{return magic_fn()} s:( directive / no_punctuation_char / [&|{\-] ) { return s; } / $(['] ![']) // single quotes are ok, double quotes are bad )+ { return tu.flattenString(r); }
Refer to https://docs.google.com/spreadsheets/d/185Fr3AFtmPTmYQ8-WRGUKA-whryTZUgrX0E7kL0BHTw/edit?usp=sharing
(See note #2)
no_punctuation_char / [&|{\-] is really just [^"'<>,.%:\[\]\x00-\x20\x7F\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000].
So the code can be simplified to
r:( $[^|[\]{!\n\r\-}~\t<&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+ / &[=|!{}\r\n[\]\-] &{return magic_fn()} s:( directive / [^"'<>,.%:\[\]\x00-\x20\x7F\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000] ) { return s; } / $(['] ![']) // single quotes are ok, double quotes are bad )+ { return tu.flattenString(r); }
(See note #3)
Line 6 (the third [...]) can be simplified further to [-{}|!=~&]
So the code can be simplified to
r:( $[^|[\]{!\n\r\-}~\t<&="'\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]+ // ⇔ $([^|[\]{!\n\r\-}~\t<&="'\u180E] / unispace)+ / &[=|!{}:;\r\n[\]\-] &{return magic_fn()} s:( directive / [-{}|!=~&] ) { return s; } / $(['] ![']) // single quotes are ok, double quotes are bad )+ { return tu.flattenString(r); }
or
r:( $[^|[\]{!\n\r\-}~\t<&="'\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]+ // ⇔ $([^|[\]{!\n\r\-}~\t<&="'\u180E] / unispace)+ / !inline_breaks s:( directive / [-{}|!=~&] ) { return s; } / $(['] ![']) // single quotes are ok, double quotes are bad )+ { return tu.flattenString(r); }