Export with full history creates broken XML (missing end tag </page>)
Closed, ResolvedPublic

Description

I noticed the bug the first time at pl.wikipedia on 2018-10-24 at 06:52 CEST.

Steps to reproduce (tested with de.WP and en.WP):

  1. Use Special:Export to export an article with the full history (uncheck "Include only the current revision, not the full history")

Result:
The XML is broken (check with a XML validator or a browser):

  • The element type "page" must be terminated by the matching end-tag "</page>".
  • "Error: mismatched tag . Expected: </page>

Exporting only the current revision seems to work (valid XML).

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 25 2018, 5:37 PM
Aklapper triaged this task as High priority.Oct 29 2018, 6:42 AM
Aklapper added a subscriber: BPirkle.

Thanks for reporting this! Confirming.

closePage() gets called from outputPageStream(); last changes were made by @BPirkle hence CC'ing.

Aklapper raised the priority of this task from High to Needs Triage.Oct 29 2018, 8:00 AM

Reproduced on English Wikipedia, investigating.

Updating so that people know this isn't being ignored.

The behavior is somewhat different than T31961, In the older ticket, both </page> and </mediawiki> closing tags were missing, which is more readily explained as a fatal error during output. In this case, only the </page> tag is missing. The </mediawiki> tag is present, indicating that processing continued.

I've looked for anything relevant in logging and have not yet found it, although it is very possible that I'm missing something. My first thought was a simple logic error, but I've not yet found that, and I've also been unable to reproduce the behavior on my local.

Clearly I'm overlooking something.

@BPirkle Can you reproduce with any page on en wp or only pages with a decent number of revsions?

I have reproduced on pages with few revisions. For example, https://en.wikipedia.org/wiki/Ponder%2C_Texas (128 revisions)

tstarling triaged this task as High priority.Oct 31 2018, 1:37 AM
tstarling added a project: Core Platform Team.

Update: after much fiddling about via a debugging technique suggested by @tstarling , I've reproduced the issue on my local. Working on a fix now.

BPirkle claimed this task.Oct 31 2018, 6:53 PM

Change 470943 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/470943

Change 470943 merged by jenkins-bot:
[mediawiki/core@master] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/470943

Change 471022 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@wmf/1.33.0-wmf.1] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471022

Change 471024 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@wmf/1.33.0-wmf.2] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471024

Change 471024 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.2] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471024

Mentioned in SAL (#wikimedia-operations) [2018-11-01T18:38:14Z] <hoo@deploy1001> Synchronized php-1.33.0-wmf.2/includes/export/WikiExporter.php: Fix for missing end tag </page> on some exports (T207974) (duration: 00m 55s)

Change 471022 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.1] Fix for missing end tag </page> on some exports

https://gerrit.wikimedia.org/r/471022

Mentioned in SAL (#wikimedia-operations) [2018-11-01T19:00:32Z] <hoo@deploy1001> Synchronized php-1.33.0-wmf.1/includes/export/WikiExporter.php: Fix for missing end tag </page> on some exports (T207974) (duration: 01m 01s)

This should now be fixed.

BPirkle closed this task as Resolved.Nov 6 2018, 2:27 AM