Page MenuHomePhabricator

mw-parser-output divs leaking into mobileview output again
Closed, ResolvedPublic

Description

A recent change[1] has caused a regression of T165115: action=mobileview sections delivering a div of class "mw-parser-output" without a matching closing div, causing the same consequences for the mobile clients (see T186927#3959632).

It appears the code for handling the new 'unwrap' post-cache transform (which is supposed to prevent this from happening, see https://gerrit.wikimedia.org/r/#/c/399910/) is flawed in that it expects that the closing </div> for the mw-parser-output div will be at the end of the document. In fact, debug text may follow.

[1] rMWd511626236f9: Add 'unwrap' ParserOutput post-cache transform https://github.com/wikimedia/mediawiki/commit/d511626236f9eeeb6a55c97c3c0d74d30150dc4f

Event Timeline

Change 409449 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Fix ParserOutput::getText 'unwrap' flag for end-of-doc comment

https://gerrit.wikimedia.org/r/409449

Screenshots of effect on iOS last time this happened:
https://phabricator.wikimedia.org/T129717#2119336
Likely same on Android in cases where mobileview still in use (Chinese lang variants?).
Another side-effect is missing text extracts.

Mholloway triaged this task as Unbreak Now! priority.Feb 9 2018, 9:48 PM

Change 409452 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/TextExtracts@master] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409452

Change 409453 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/MobileFrontend@master] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409453

Change 409452 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@master] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409452

Change 409455 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/TextExtracts@wmf/1.31.0-wmf.20] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409455

Change 409455 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@wmf/1.31.0-wmf.20] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409455

Change 409453 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409453

Change 409449 merged by jenkins-bot:
[mediawiki/core@master] Fix ParserOutput::getText 'unwrap' flag for end-of-doc comment

https://gerrit.wikimedia.org/r/409449

Change 409456 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/MobileFrontend@wmf/1.31.0-wmf.20] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409456

Change 409456 merged by Gergő Tisza:
[mediawiki/extensions/MobileFrontend@wmf/1.31.0-wmf.20] Bump cache version due to 'unwrap' ParserOutput option

https://gerrit.wikimedia.org/r/409456

Change 409457 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@wmf/1.31.0-wmf.20] Fix ParserOutput::getText 'unwrap' flag for end-of-doc comment

https://gerrit.wikimedia.org/r/409457

Change 409457 merged by Gergő Tisza:
[mediawiki/core@wmf/1.31.0-wmf.20] Fix ParserOutput::getText 'unwrap' flag for end-of-doc comment

https://gerrit.wikimedia.org/r/409457

Mentioned in SAL (#wikimedia-operations) [2018-02-09T22:42:14Z] <tgr@tin> Synchronized php-1.31.0-wmf.20/includes/parser/ParserOutput.php: emergency fix for T186927 (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2018-02-09T22:43:56Z] <tgr@tin> Synchronized php-1.31.0-wmf.20/extensions/MobileFrontend/includes/api/ApiMobileView.php: emergency fix for T186927 (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2018-02-09T22:45:15Z] <tgr@tin> Synchronized php-1.31.0-wmf.20/extensions/TextExtracts/includes/ApiQueryExtracts.php: emergency fix for T186927 (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2018-02-09T23:26:53Z] <tgr@tin> Synchronized php-1.31.0-wmf.20/extensions/TextExtracts/includes/ApiQueryExtracts.php: emergency fix for T186927 (now incldes actual code change!) (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2018-02-09T23:28:08Z] <tgr@tin> Synchronized php-1.31.0-wmf.20/extensions/MobileFrontend/includes/api/ApiMobileView.php: emergency fix for T186927 (now incldes actual code change!) (duration: 00m 55s)

Very quickly fixed. Thanks again for the fast turn-around! :)

Caused by rMWd511626236f9: Add 'unwrap' ParserOutput post-cache transform which changed how the parser handles the <div class="mw-parser-output"> wrapper added for TemplateStyles CSS scoping (wrapping used to be one of the parser options, which meant callers asking for wrapped and non-wrapped HTML created separate parser cache entries, contributing to T167784: WMF ParserCache disk space exhaustion; the patch changed that to a unified (wrapped) parser cache entry with callers being able to ask for an on-the-fly unwrap transformation when fetching from the cache). The unwrap transform assumed the wrapping </div> is at the end of the HTML, which is true for freshly parsed content, but not for content read from the parser cache, which adds debug data in a HTML content to the very end. So the unwrap failed, and the unexpected HTML structure broke the extracts and mobileview API modules, and indirectly the RESTBase summary endpoint. Debugging and fixing was complicated by extracts and mobileview having their own long-lived caches.

wmf.20 was deployed to group1 at 17:19 UTC and to group2 at 18:31. The fix was synced at 23:28, meaning the bug was live for 5 or 6 hours, depending on wiki. (Response breakdown is: 3 hours until problem was noticed, 2 hours debugging and writing a fix, an extra hour due to initially not deploying part of the fix by mistake and then assuming we are waiting for some caching layer to expire.) User impact included broken article cards and no previews on iOS, broken link previews on Android, probably broken link preview popups on desktop. Not sure how much mobile web was affected. Both affected APIs are cached for 22 days so presumably something like a percent of our articles were affected.
There should probably be an incident report (ping @greg).

Suggested followups:

  • Make sure there are smoke tests on beta or group0 which catch errors in key mobile APIs like extracts and mobileview. Not trivial because of the caching in those APIs - probably the test should involve creating a new page.
  • Have a single place for documenting all content cache layers in WMF production (in the current case: parser cache, MobileFrontend, TextExtracts, RESTBase, Varnish). This seems to be the a constant source of confusion.

Some minor annoyances encountered during debugging: T186935: mwscript is missing on most production hosts, T186936: shell.php broken on most production hosts

There should probably be an incident report (ping @greg).

Sorry, huge bugmail backlog, just saw this.

Pinging @Jrbranaa for his assistance.