Page MenuHomePhabricator

Long headings result in long table of contents links that cause "PHP Warning: DOMDocument::saveHTML(): Memory allocation failed : escaping URI value"
Closed, ResolvedPublicSecurity

Description

Error

MediaWiki version: 1.35.0-wmf.28

message
PHP Warning: DOMDocument::saveHTML(): Memory allocation failed : escaping URI value

Impact

This happens on 1.35.0-wmf.30, currently on group0 and group1. About ten of these in the past few hours, so not enough to block the train. However, seems like a weird error to me, and would appreciate someone having a look.

Notes

Details

Risk Rating
Medium
Author Affiliation
WMF Technology
Request ID
XqqTFgpAAL0AA9f64scAAADF
Request URL
https://ja.wikipedia.org/w/index.php?title=%E3%82%A6%E3%83%BC%E3%83%9E%E3%83%B3&oldid=77303774
Stack Trace
exception.trace
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.35.0-wmf.28/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(279): DOMDocument->saveHTML()
#2 /srv/mediawiki/php-1.35.0-wmf.28/extensions/MobileFrontend/includes/MobileFormatter.php(291): HtmlFormatter\HtmlFormatter->getText(NULL)
#3 /srv/mediawiki/php-1.35.0-wmf.28/extensions/MobileFrontend/includes/ExtMobileFrontend.php(107): MobileFormatter->getText()
#4 /srv/mediawiki/php-1.35.0-wmf.28/extensions/MobileFrontend/includes/MobileFrontendHooks.php(275): ExtMobileFrontend::domParse(OutputPage, string, boolean)
#5 /srv/mediawiki/php-1.35.0-wmf.28/includes/Hooks.php(174): MobileFrontendHooks::onOutputPageBeforeHTML(OutputPage, string)
#6 /srv/mediawiki/php-1.35.0-wmf.28/includes/Hooks.php(234): Hooks::callHook(string, array, array, NULL, string)
#7 /srv/mediawiki/php-1.35.0-wmf.28/includes/OutputPage.php(1991): Hooks::runWithoutAbort(string, array)
#8 /srv/mediawiki/php-1.35.0-wmf.28/includes/OutputPage.php(2003): OutputPage->addParserOutputText(ParserOutput, array)
#9 /srv/mediawiki/php-1.35.0-wmf.28/includes/page/Article.php(817): OutputPage->addParserOutput(ParserOutput, array)
#10 /srv/mediawiki/php-1.35.0-wmf.28/includes/actions/ViewAction.php(66): Article->view()
#11 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(519): ViewAction->show()
#12 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(305): MediaWiki->performAction(Article, Title)
#13 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(973): MediaWiki->performRequest()
#14 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(535): MediaWiki->main()
#15 /srv/mediawiki/php-1.35.0-wmf.28/index.php(47): MediaWiki->run()
#16 /srv/mediawiki/w/index.php(3): require(string)
#17 {main}
Related Changes in Gerrit:

Event Timeline

OH, I can't read. It's actually wmf.28 not wmf.30, sorry.

Jdlrobson moved this task from Needs triage to Triaged on the Mobile board.
Jdlrobson moved this task from Triaged to MobileFrontend specific on the Mobile board.
Krinkle triaged this task as High priority.May 6 2020, 6:33 PM
Krinkle subscribed.

Tentatively triaging as high priority. If this is not specific to the given page but due to a php-dom bug that can crash random pages on mobile that's pretty serious.

I do note that I thought we had mostly switched to using Remex for HTML processing, which might have solved this issue already.

Note: I've not been able to replicate this or find a recent entry in logstash for analysis so I'm not sure what to do about this right now.

Jdlrobson renamed this task from PHP Warning: DOMDocument::saveHTML(): Memory allocation failed : escaping URI value to Long headings result in long table of contents links that cause "PHP Warning: DOMDocument::saveHTML(): Memory allocation failed : escaping URI value".Jul 29 2020, 7:47 PM

After analyzing this a little, my judgement is that this is an edge case that shouldn't impact pages and is a symptom of a larger problem (hence the broadened set of tags):
Long headlines can render pages unusable on desktop and non-renderable on mobile

The offending article has a 224001 character heading which creates 224001 character ids in the table of contents which the HtmlFormatter attempts to parse leading to memory problems. Ideally the parser would limit heading or anchor link lengths. I don't see a nice way for the MobileFormatter or MobileFrontend to predict this problem ahead of time unfortunately.

FWIW the target page crashes my browser in desktop for page views and VisualEditor so this seems like a bigger problem - vandalisers can easily render articles unusable and problematic with long headlines.

Tagging parser team in case they have any thoughts on how we can limit user input in this form.

The page at https://ja.wikipedia.org/w/index.php?title=%E3%82%A6%E3%83%BC%E3%83%9E%E3%83%B3&oldid=77303774 loads for me in about 2 seconds in Firefox, Safari and Chrome. What error does the browser or OS show when it crashes?

For me as soon as I scroll down the page I get the beach ball on OSX and then my entire browser locks up and crashes. RIP all my tabs.

HTMLFormatter should die: T217360: Replace libxml/xpath in HtmlFormatter with Remex/zest

That said, this looks like an OSX browser bug (@Jdlrobson are you using safari?). There "shouldn't" be any way that "a lot of HTML" will crash your browser.

Can we get a stack trace for the HTMLFormatter crash? I could prioritize replacing that one particular method call in HTMLFormatter with a Remex equivalent; a full replacement for HTMLFormatter might take a little longer.

EDIT: never mind, I see there's a stack trace at the top in the task description.

I can reproduce the crash when visiting https://ja.wikipedia.org/w/index.php?title=%E3%82%A6%E3%83%BC%E3%83%9E%E3%83%B3&oldid=77303774 on Firefox 68.11.0esr(64-bit)/Linux amd64. Just kills the tab, not the entire browser.

Works fine on Chrome 79.0.3945.117 (64-bit) on Linux amd64, although it's still a little unstable: I once got the chrome tab to crash by opening a new tab and then switching back to the jawiki tab; this isn't repeatable, though.

cscott raised the priority of this task from High to Needs Triage.Aug 12 2020, 4:08 PM
cscott set Security to Software security bug.
cscott added projects: Security, Security-Team.
cscott changed the visibility from "Public (No Login Required)" to "Custom Policy".
cscott changed the subtype of this task from "Production Error" to "Security Issue".

Applying the security tag out of an abundance of caution.

Crashing the tab w/ a long HTML string seems like a buffer overflow sort of situation, and it certainly seems 'exploitable' by writing wikitext (as jawiki does). You probably couldn't embed an actual shellcode in a wikitext heading, but I'm not taking any chances.

To be clear: the fact that you can crash a browser tab with a particular wikitext string is the 'security issue' here. The fact that HtmlFormatter sometimes (?) OOMs when rendering this page is a separate issue that does not appear to have security implications.

Local experiments seem to indicate that it is the TOC entry which causes the crash, not the <h> tag. Removing the single <li> item corresponding to the long heading makes the page load w/o crashing. Within that <li> item, truncating the <a href='#....thousands of characters'> to <a href="#"> also makes the crash go away.

So we can probably fix this (on desktop at least) by tweaking Sanitizer::escapeId* to truncate the ID to a reasonable max limit -- maybe 1024 UTF-8 bytes, something like that.

Not certain whether that would also resolve the OOM in the Mobile Frontend, but it might.

Regardless, we should *also* attempt to report this upstream to the browser folk, since a long anchor in an href "shouldn't" crash a browser tab.

Proposed patch to truncate IDs at a reasonable limit -- I've chosen 1024 UTF-8 characters, that really ought to be sufficient for unique IDs even on 4-byte-per-character locales, but we could probably set this to 4096 or even 16384 characters and still be fine.

Again, I'm following the security patch process out of an abundance of caution, but I think the above patch ought to be safe to upload to gerrit.

Slightly improved patch (relnotes, better commit msg):

That said, this looks like an OSX browser bug (@Jdlrobson are you using safari?). There "shouldn't" be any way that "a lot of HTML" will crash your browser.

I'm using Firefox 79 on OSX. It's slow JS execution that's crashing my browser and my guess is it relates to the HTML somehow (potential parsing). Usually a popup shows at the top of the page asking me to stop executing the JS but for this JS I don't get that far. Everything freezes until I close.

+1 to the proposed patch above (T251506#6380139). I assume mw never allows multibyte characters within id attributes? @Reedy or I (or any deployer really) could deploy this as a security patch during next Monday's security window. Though I'm also personally fine if this gets merged early next week via gerrit to go out with the train.

I'll give it a few more hours for feedback, but I'll upload the above patch to gerrit tonight/tomorrow morning if I don't hear any objection, and I'll make sure it gets reviewed and merged by next week's train.

+1 to the proposed patch above (T251506#6380139). I assume mw never allows multibyte characters within id attributes? @Reedy or I (or any deployer really) could deploy this as a security patch during next Monday's security window. Though I'm also personally fine if this gets merged early next week via gerrit to go out with the train.

MediaWiki does allow multibyte characters... which means that you're right, I should be using mb_substr. That's fairer for non-latin wikis, in any case.

Fixed:

Is this only a thing in 1.35? Or is it applicable to older versions too?

Reedy added a subscriber: ovasileva.

Is this only a thing in 1.35? Or is it applicable to older versions too?

It's a browser bug, really, that we're working around. So it's applicable to all supported versions of MW I suppose. wmf.5 with the above patch hasn't been rolled out to jawiki yet. I just confirmed that the jawiki page is still crashing both chrome and firefox on my machine with wmf.4. Let's let wmf.5 get rolled out today and tomorrow and confirm it works around the problem before we start cherry-picking.

Security-Team: you've moved this to "Our Part Is Done" but I could still use some help filing an appropriate bug upstream w/ the browser vendors. Maybe tab crashing is "just a bug" not a "security bug" given browser sandboxing and whatnot, but it still seems like we should forward it up to them.

Security-Team: you've moved this to "Our Part Is Done" but I could still use some help filing an appropriate bug upstream w/ the browser vendors. Maybe tab crashing is "just a bug" not a "security bug" given browser sandboxing and whatnot, but it still seems like we should forward it up to them.

We can help shepherd this process, sure, if the bug is consistently demonstrable. Testing on Mojave (10.14.6, 2.2 Ghz i7, 8 Mb RAM) , I cannot reproduce the issue with the jawiki (still on wmf.4 as of now) URL within Chrome 84, Firefox 79, Safari 13.1.2, Opera 65 or Edge 84. I get a full page load with smooth scrolling, at least within Chrome, Firefox and Safari.

The patch fixed the issue on Firefox/Linux (@Jdlrobson can you confirm on yr OSX?). Oddly enough, I can still get the tab to crash in Google Chrome by dragging the jawiki tab out into a new window after it's loaded.

Couldn't find anything related in recent logstash; the original ~Apr 30 logstash entries are gone already.

sbassett triaged this task as Medium priority.Tue, Mar 31, 2:41 PM
sbassett changed Author Affiliation from N/A to WMF Technology.
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".
sbassett changed Risk Rating from N/A to Medium.
sbassett added a project: SecTeam-Processed.

Change #1264708 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Sanitizer: Truncate IDs to a reasonable length; remove escapeIdReferenceList

https://gerrit.wikimedia.org/r/1264708

Change #1268630 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a26

https://gerrit.wikimedia.org/r/1268630

Change #1268630 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a26

https://gerrit.wikimedia.org/r/1268630

Change #1268647 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@wmf/1.46.0-wmf.23] Bump wikimedia/parsoid to 0.23.0-a26

https://gerrit.wikimedia.org/r/1268647

Change #1268647 merged by jenkins-bot:

[mediawiki/vendor@wmf/1.46.0-wmf.23] Bump wikimedia/parsoid to 0.23.0-a26

https://gerrit.wikimedia.org/r/1268647