Page MenuHomePhabricator

HTML5 ids seems to change how wikilink fragments are parsed (when LanguageConverter is enabled)
Closed, ResolvedPublic

Description

With $wgFragmentMode=['legacy']:

$ echo '<sup id="cite_ref-&#123;&#124;a&#125;&#125;_1-0" class="reference">[[#cite_note-&#123;&#123;echo&#124;a&#125;&#125;-1|&#91;1&#93;]]</sup>' | php maintenance/parse.php 
<div class="mw-parser-output"><p><sup id="cite_ref-.7B.7Becho.7Ca.7D.7D_1-0" class="reference"><a href="#cite_note-.7B.7Becho.7Ca.7D.7D-1">&#91;1&#93;</a></sup>
</p></div>

But with $wgFragmentMode=['html5','legacy']:

$ echo '<sup id="cite_ref-&#123;&#124;a&#125;&#125;_1-0" class="reference">[[#cite_note-&#123;&#123;echo&#124;a&#125;&#125;-1|&#91;1&#93;]]</sup>' | php maintenance/parse.php 
<div class="mw-parser-output"><p><sup id="cite_ref-&#123;&#123;echo&#124;a&#125;&#125;_1-0" class="reference"><a href="#cite_notea}1">&#91;1&#93;</a></sup>

Note that the href attribute in the second example is double-decoded: the HTML entities are expanded to {{echo|a}} and then that appears to be processed again in some way to yield a} (and a single dash is removed?).

Event Timeline

cscott created this task.Sep 18 2017, 8:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 18 2017, 8:36 PM

...only when LanguageConverter is enabled (sigh).

Stack trace is:

Backtrace:
#0 /home/cananian/Projects/Wikimedia/core/languages/LanguageConverter.php(403): Sanitizer::decodeTagAttributes(string)
#1 /home/cananian/Projects/Wikimedia/core/languages/LanguageConverter.php(648): LanguageConverter->autoConvert(string, string)
#2 /home/cananian/Projects/Wikimedia/core/languages/LanguageConverter.php(617): LanguageConverter->recursiveConvertTopLevel(string, string)
#3 /home/cananian/Projects/Wikimedia/core/languages/LanguageConverter.php(600): LanguageConverter->convertTo(string, string)
#4 /home/cananian/Projects/Wikimedia/core/languages/Language.php(4069): LanguageConverter->convert(string)
#5 /home/cananian/Projects/Wikimedia/core/includes/parser/Parser.php(1364): Language->convert(string)
#6 /home/cananian/Projects/Wikimedia/core/includes/parser/Parser.php(454): Parser->internalParseHalfParsed(string, boolean, boolean)
#7 /home/cananian/Projects/Wikimedia/core/maintenance/parse.php(138): Parser->parse(string, Title, ParserOptions)
#8 /home/cananian/Projects/Wikimedia/core/maintenance/parse.php(85): CLIParser->parse(string)
#9 /home/cananian/Projects/Wikimedia/core/maintenance/parse.php(77): CLIParser->render(string)
#10 /home/cananian/Projects/Wikimedia/core/maintenance/doMaintenance.php(92): CLIParser->execute()
#11 /home/cananian/Projects/Wikimedia/core/maintenance/parse.php(144): require_once(string)
#12 {main}

The href is intact when it's created, but LanguageConverter appears to munge it afterwards. Will dig more tomorrow.

Aklapper renamed this task from HTML5 ids seems to change how wikilink fragments are parsed. to HTML5 ids seems to change how wikilink fragments are parsed (when LanguageConverter is enabled).Sep 19 2017, 1:44 PM
MaxSem set Security to Software security bug.Sep 29 2017, 11:45 PM
MaxSem added a project: Security.
MaxSem changed the visibility from "Public (No Login Required)" to "Custom Policy".
MaxSem added a subscriber: MaxSem.

Okay, if we put one half of language converter expression into one link and another half into another link, we can cut HTML:

$ echo '[[#foo-&#123;]] [[#&#124;bar&#125;-]]' |mwscript parse.php --wiki=wiki

<div class="mw-parser-output"><p><a href="#foobar">#&#124;bar&#125;-</a>
</p></div>

Can't figure out a way to escalate it to a full-blown XSS, but maybe someone else can. Protecting as a precaution. At least unlike T176247 it's not present in released MW versions.


Proposed patch, please review.

Bawolff added a subscriber: Bawolff.Oct 3 2017, 8:14 PM

LanguageConverter has lots of weird behaviour that users rely on, which makes it a giant pita. This seems to have the same problem as my attempted fix for T119158 in which it breaks stuff like http://example.com/-{sr-el:foo;sr-ec:bar}- . Which maybe we should just break as its insane behaviour. But lots of users seem to rely on that, and last time i checked wikidata was relying on it (See the other bug).

Honestly, I kind of gave up on the other bug as I wasn't sure what to do about it. Which is bad, as its an important bug that needs to be fixed.

Reedy added a subscriber: Reedy.Oct 4 2017, 2:22 PM

Just because a security patch breaks a "feature" doesn't mean we should stop deploying it...

Can we loop in the Wikidata folks to see if we can get them to fix their code to not rely on it first, rather than breaking everything?

LanguageConverter has lots of weird behaviour that users rely on, which makes it a giant pita. This seems to have the same problem as my attempted fix for T119158 in which it breaks stuff like http://example.com/-{sr-el:foo;sr-ec:bar}- . Which maybe we should just break as its insane behaviour. But lots of users seem to rely on that, and last time i checked wikidata was relying on it (See the other bug).
Honestly, I kind of gave up on the other bug as I wasn't sure what to do about it. Which is bad, as its an important bug that needs to be fixed.

OTOH, looking at externallinks table, looks like this is less of an issue then it used to be.

kaldari moved this task from Untriaged to Bug backlog on the Community-Tech board.
cscott added a comment.Nov 2 2017, 4:40 PM

Let's break the bad behavior and fix it. The Parsing team now has a robust linting framework, so for article contents (at least) we can straightforwardly write a linter rule for stuff like http://example.com/-{sr-el:foo;sr-ec:bar}- and replace it. (-{sr-el:http://example.com/foo;sr-ec:http://example.com/bar}- is a sane replacement).

Wikidata is slightly harder than wikitext, since we don't have the tools built, but I agree we should get the fix made. It just prolongs the pain to have these weird corner-case behaviors lurking around our code base.

MaxSem added a comment.Nov 2 2017, 8:33 PM

+1 to cscott. We shouldn't be enabling this.

Ok, this has gone on long enough. It appears (based on externallinks table) that it is no longer common for links to use langconverter syntax on zhwiki. With that in mind, I'd like to bite the bullet and just deploy the patches on T124404 and T119158. If we don't do it at some point we'll never do it.

Ok, so this is fixed now, with the fix for T119158 deployed.

So the remaining question is - do we want to change Html::expandAttributes as an extra precaution?

I'm unsure.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".Nov 15 2017, 6:11 PM
Bawolff added a subscriber: Fomafix.
cscott closed this task as Resolved.Dec 1 2017, 8:54 PM
cscott claimed this task.

Ok, so this is fixed now, with the fix for T119158 deployed.
So the remaining question is - do we want to change Html::expandAttributes as an extra precaution?
I'm unsure.

I believe @MaxSem already fixed expandAttributes, with 47416c0a862fbbdb58b5942a3118f220546878da

Since it uses Sanitizer::encodeAttribute now, we should be safe(r).

Closing bug; probably best to open a new task for Html::expandAttributes if I'm misunderstanding @MaxSem's fix.

sbassett moved this task from Patch pending review to Done on the Security board.Sep 26 2019, 2:40 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptSep 26 2019, 2:40 PM