In **extlink_preprocessor_text_parameterized** rule in `/lib/wt2html/pegTokenizer.pegjs`, we have
```
r:(
$[^'<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+
/ !inline_breaks s:( directive / no_punctuation_char / [&|{\-] ) { return s; }
/ $([.:,] !(space / eolf))
/ $(['] ![']) // single quotes are ok, double quotes are bad
)+ { return tu.flattenString(r); }
```
line 2 `$[^'..."'...]` has single quote twice, and one should be removed.
line 4 `$([.:,]` is not reachable because the 3 characters aren't excluded by the first test.
`!inline_breaks` is really justcan be thought as `&[^=|=|!{}:;\r\n[\]\-] &{return magic_fn()}`. Since `[;:]` aren't excluded by the first test, we can think of it as `&[=|!{}\r\n[\]\-] &{return magic_fn()}`
So the code can be simplified to
```
r:(
$[^<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+
/ &[^=|!{}:;\r\n[\]\-] &{return magic_fn()}
s:(
directive
/ no_punctuation_char
/ [&|{\-]
) { return s; }
/ $(['] ![']) // single quotes are ok, double quotes are bad
)+ { return tu.flattenString(r); }
```
---
Refer to https://docs.google.com/spreadsheets/d/185Fr3AFtmPTmYQ8-WRGUKA-whryTZUgrX0E7kL0BHTw/edit?usp=sharing
(see note #1)
In `$[^<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+ / &[^=|!{}:;\r\n[\]\-] XXXX`, the second part is equivalent to `&["'<~\t&=\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000] XXX `
(See note #2)
`no_punctuation_char / [&|{\-]` is really just `[^"'<>,.%:\[\]\x00-\x20\x7F\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]`.
So the code can be simplified to
```
r:(
$[^|[\]{!\n\r\-}~\t<&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+
/ &["'<~\t&=\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]=|!{}\r\n[\]\-] &{return magic_fn()}
s:(
directive
/ [^"'<>,.%:\[\]\x00-\x20\x7F\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]
) { return s; }
/ $(['] ![']) // single quotes are ok, double quotes are bad
)+ { return tu.flattenString(r); }
```
---
(See note #3)
Line 6 (the third `[...]`) can be simplified further to `[&~]``[-{}|!=]`
So the code can be simplified to
```
r:(
$[^|[\]{!\n\r\-}~\t<&="'\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]+ // ⇔ $([^|[\]{!\n\r\-}~\t<&="'\u180E] / unispace)+
/ &["'<~\t&=\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000] // ⇔ &([~\t<&="'\u180E] / unispace)=|!{}:;\r\n[\]\-] &{return magic_fn()}
s:( directive / [&~-{}|!=] ) { return s; }
/ $(['] ![']) // single quotes are ok, double quotes are bad
)+ { return tu.flattenString(r); }
```
---
Now, consider:
```
directive
= comment // <!--something-->
/ extension_tag // <something>
/ tplarg_or_template // {{something}}
/ & "-{" v:lang_variant_or_tpl // -{{something}}-
/ & "&" e:htmlentity // &something;
/ include_limits // <something>
```
`&([~\t<&="'\u180E] / unispace)` doesn't quite make sense, and should just become `&[<&~]`.
So we simplify furtheor
```
r:(
$[^|[\]{!\n\r\-}~\t<&="'\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]+ // ⇔ $([^|[\]{!\n\r\-}~\t<&="'\u180E] / unispace)+
/ &[<&~]/ !inline_breaks s:( directive / [&~] ) { return s; }
/ $(['] ![']) // single quotes are ok,-{}|!=] ) { return s; double quotes are bad
)+ { return tu.flattenString(r); }
```
---
Now, why do we exclude `~` first and then accept `~` later? We should just have
```
r:(
$[^<&\-[\]{}!|\n\r\t="'\u180E \u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]+ // ⇔ $([^<&\-[\]{}!|\n\r\t="'\u180E] / unispace)+
/ &[<&] s:directive { return s }
/ $("&")
/ $(['] ![']) // single quotes are ok, double quotes are bad
)+ { return tu.flattenString(r); }
```