Page MenuHomePhabricator

Duplicate mw-parser-output elements in action=parse API output
Closed, ResolvedPublic

Description

Visiting https://en.wikipedia.org/w/api.php?formatversion=2&format=json&action=parse&prop=text|modules|langlinks&page=Spain&useskin=vector in an incognito window I see the content is double wrapped with two div.parser-output elements.

{"warnings":{"parse":{"warnings":"Property \"modules\" was set but not \"jsconfigvars\" or \"encodedjsconfigvars\". Configuration variables are necessary for proper module usage."}},"parse":{"title":"Spain","pageid":26667,"text":"<div class=\"mw-parser-output\"><div class=\"mw-parser-output\">

Expected: it should not be double wrapped. There should only be one of these elements.

This breaks various client expectations (including a tool reading web use for QA)

Event Timeline

Jdlrobson created this task.Sep 6 2018, 9:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 6 2018, 9:00 PM
Jdlrobson updated the task description. (Show Details)Sep 6 2018, 9:01 PM
Jdlrobson triaged this task as High priority.Sep 6 2018, 9:11 PM
Jdlrobson updated the task description. (Show Details)
Tgr added a subscriber: Tgr.Sep 6 2018, 9:29 PM

Got cached slightly before the deploy (20180906110544). Pre rMW0dc7ba02b448: Apply content wrapping in ParserOutput::getText() the wrapper div was added on render and removed on getText() if unwrap=false was specified. Now the wrapper div is added in getText() unless unwrap=false is specified. So with a ParserOutput cached before the code change there will be double-wrapping with unwrap=false and single-wrapping with unwrap=true. We should probably just restore the old unwrap logic and run it unconditionally until old parser cache records are all expired.

Tgr added a comment.Sep 7 2018, 7:40 AM

@Jdlrobson any thoughts about the impact of this bug? Does it break clients, or the HTML wrangling in the mobile APIs, or is it just a QA inconvenience?

Change 458732 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Unwrap HTML loaded from parser cache

https://gerrit.wikimedia.org/r/458732

daniel added a subscriber: daniel.EditedSep 7 2018, 9:50 AM

Thanks for the patch, @Tgr!

Perhaps the unwrap flag should be deprecated when that code is removed, and ApiMobileView should be changed to use [ 'wrapperDivClass' => '' ] instead.

@Jdlrobson any thoughts about the impact of this bug? Does it break clients, or the HTML wrangling in the mobile APIs, or is it just a QA inconvenience?

It's hard to say.. I suspect it could impact TextExtracts and possibly the mobile content server (although I think they are fully using Parsoid. Does iOS use the mediawiki API? It's more likely to impact unofficial mediawiki clients.

Right now, I'm only seeing breakages in the MobileFrontend ContentProvider which we use to test production wiki content, which we can work around provided we run action=purge on the page we want to test on.

daniel added a comment.Sep 7 2018, 3:56 PM

So with a ParserOutput cached before the code change there will be double-wrapping with unwrap=false and single-wrapping with unwrap=true.

What I don't understand is that for a PO cached with the old logic, getWrapperDivClass should return an empty string, disabling wrapping. Any given PO should have *either* the wrapper in the text *or* a wrapper class triggering the wrapping code, never both.

Anomie added a subscriber: Anomie.Sep 7 2018, 4:02 PM

What I don't understand is that for a PO cached with the old logic, getWrapperDivClass should return an empty string, disabling wrapping. Any given PO should have *either* the wrapper in the text *or* a wrapper class triggering the wrapping code, never both.

Ah, I think I see. The PO was probably fine, it was a wmf.19-style PO that would have had getWrapperDivClass returning an empty string. But your change to ApiParse has it overriding that by passing 'wrapperDivClass' (where formerly it had used 'unwrap'), resulting in the double wrapping.

Gergő's patch should still fix it (and will fix the other bugs caused by 'unwrap' no longer working for wmf.19 POs), although the test being added there is bogus and it still needs the fix I mentioned in code review.

Change 458732 merged by jenkins-bot:
[mediawiki/core@master] Unwrap HTML loaded from parser cache

https://gerrit.wikimedia.org/r/458732

Anomie added a comment.Sep 7 2018, 5:30 PM

This should be fixed in master with the merge of that patch. But probably someone will want to backport it instead of waiting for wmf.22.

Kf8 added a subscriber: Kf8.Sep 9 2018, 1:08 AM

Change 459547 had a related patch set uploaded (by Anomie; owner: Gergő Tisza):
[mediawiki/core@wmf/1.32.0-wmf.20] Unwrap HTML loaded from parser cache

https://gerrit.wikimedia.org/r/459547

Change 459547 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.20] Unwrap HTML loaded from parser cache

https://gerrit.wikimedia.org/r/459547

Mentioned in SAL (#wikimedia-operations) [2018-09-10T14:28:09Z] <anomie@deploy1001> Synchronized php-1.32.0-wmf.20/includes/parser/ParserOutput.php: Backport for T203716 (duration: 00m 50s)

Anomie closed this task as Resolved.Sep 10 2018, 2:29 PM
Anomie claimed this task.

Fixed and backported.

Anomie reassigned this task from Anomie to Tgr.Sep 10 2018, 2:32 PM

Change 465465 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Revert "Unwrap HTML loaded from parser cache"

https://gerrit.wikimedia.org/r/465465

Change 465465 merged by jenkins-bot:
[mediawiki/core@master] Revert "Unwrap HTML loaded from parser cache"

https://gerrit.wikimedia.org/r/465465