In pegTokenizer.pegjs, we have
extlink_preprocessor_text_parameterized
= r:( $[^'<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+
/ !inline_breaks s:( directive / no_punctuation_char / [&|{\-] ) { return s; }
// !inline_breaks no_punctuation_char
/ $([.:,] !(space / eolf))
/ $(['] ![']) // single quotes are ok, double quotes are badand
directive
= comment
/ extension_tag
/ tplarg_or_template
/ & "-{" v:lang_variant_or_tpl { return v; }
/ & "&" e:htmlentity { return e; }
/ include_limitsHowever, we cannot have language variant or extension tags in links. The code should be changed to
extlink_preprocessor_text_parameterized
= r:( $[^'<~[{\n\r|!\]}\-\t&="' \u00A0\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]+
/ !inline_breaks s:( directive_in_extlink / no_punctuation_char / [&|{\-] ) { return s; }
// !inline_breaks no_punctuation_char
/ $([.:,] !(space / eolf))
/ $(['] ![']) // single quotes are ok, double quotes are bad
)+ {
return tu.flattenString(r);
}and add
directive_in_extlink
= !extension_tag
comment
/ tplarg_or_template
/ & "&" e:htmlentity { return e; }
/ include_limitsTest commands:
echo '[http://www.google.com/search?q=-{zh-cn:中国;zh-tw:中華民國}- google search]' | node bin/parse.js --normalize=parsoid --prefix=zhwiki
echo '[http://www.google.com/search?q=<nowiki>a</nowiki>]' | node bin/parse.js --normalize=parsoid
echo '[http://www.google.com/search?q=<pre>a</pre>]' | node bin/parse.js --normalize=parsoid