Page MenuHomePhabricator

Export with full history creates broken XML (missing end tag </page>)
Closed, ResolvedPublic

Description

I noticed the bug the first time at pl.wikipedia on 2018-10-24 at 06:52 CEST.

Steps to reproduce (tested with de.WP and en.WP):

  1. Use Special:Export to export an article with the full history (uncheck "Include only the current revision, not the full history")

Result:
The XML is broken (check with a XML validator or a browser):

  • The element type "page" must be terminated by the matching end-tag "</page>".
  • "Error: mismatched tag . Expected: </page>

Exporting only the current revision seems to work (valid XML).

Event Timeline

Aklapper added a subscriber: BPirkle.

Thanks for reporting this! Confirming.

closePage() gets called from outputPageStream(); last changes were made by @BPirkle hence CC'ing.

Aklapper raised the priority of this task from High to Needs Triage.Oct 29 2018, 8:00 AM

Reproduced on English Wikipedia, investigating.

Updating so that people know this isn't being ignored.

The behavior is somewhat different than T31961, In the older ticket, both </page> and </mediawiki> closing tags were missing, which is more readily explained as a fatal error during output. In this case, only the </page> tag is missing. The </mediawiki> tag is present, indicating that processing continued.

I've looked for anything relevant in logging and have not yet found it, although it is very possible that I'm missing something. My first thought was a simple logic error, but I've not yet found that, and I've also been unable to reproduce the behavior on my local.

Clearly I'm overlooking something.

@BPirkle Can you reproduce with any page on en wp or only pages with a decent number of revsions?

I have reproduced on pages with few revisions. For example, https://en.wikipedia.org/wiki/Ponder%2C_Texas (128 revisions)

Update: after much fiddling about via a debugging technique suggested by @tstarling , I've reproduced the issue on my local. Working on a fix now.

Change 470943 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/470943

Change 470943 merged by jenkins-bot:
[mediawiki/core@master] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/470943

Change 471022 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@wmf/1.33.0-wmf.1] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471022

Change 471024 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@wmf/1.33.0-wmf.2] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471024

Change 471024 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.2] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471024

Mentioned in SAL (#wikimedia-operations) [2018-11-01T18:38:14Z] <hoo@deploy1001> Synchronized php-1.33.0-wmf.2/includes/export/WikiExporter.php: Fix for missing end tag </page> on some exports (T207974) (duration: 00m 55s)

Change 471022 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.1] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471022

Mentioned in SAL (#wikimedia-operations) [2018-11-01T19:00:32Z] <hoo@deploy1001> Synchronized php-1.33.0-wmf.1/includes/export/WikiExporter.php: Fix for missing end tag </page> on some exports (T207974) (duration: 01m 01s)