In pegTokenizer.pegjs, we have
extlink_preprocessor_text_parameterized = r:( $[^'<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+ / !inline_breaks s:( directive / no_punctuation_char / [&|{\-] ) { return s; } // !inline_breaks no_punctuation_char / $([.:,] !(space / eolf)) / $(['] ![']) // single quotes are ok, double quotes are bad
and
directive = comment / extension_tag / tplarg_or_template / & "-{" v:lang_variant_or_tpl { return v; } / & "&" e:htmlentity { return e; } / include_limits
However, we cannot have language variant or extension tags in links. The code should be changed to
extlink_preprocessor_text_parameterized = r:( $[^'<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+ / !inline_breaks s:( directive_in_extlink / no_punctuation_char / [&|{\-] ) { return s; } // !inline_breaks no_punctuation_char / $([.:,] !(space / eolf)) / $(['] ![']) // single quotes are ok, double quotes are bad )+ { return tu.flattenString(r); }
and add
directive_in_extlink = !extension_tag comment / tplarg_or_template / & "&" e:htmlentity { return e; } / include_limits
Test commands:
echo '[http://www.google.com/search?q=-{zh-cn:中国;zh-tw:中華民國}- google search]' | node bin/parse.js --normalize=parsoid --prefix=zhwiki echo '[http://www.google.com/search?q=<nowiki>a</nowiki>]' | node bin/parse.js --normalize=parsoid echo '[http://www.google.com/search?q=<pre>a</pre>]' | node bin/parse.js --normalize=parsoid