Page MenuHomePhabricator

Put the <model> and <format> tags before the <text> tag in the XML dumps.
Closed, ResolvedPublic

Description

According to export-0.9.xsd, the <model> and <format> tags follow the <text> tag. That is however quite annoying when processing the XML event stream, since it means we do not know how to process the contents of the <text> tag when we receive it. Adding <model> and <format> at the end was simply an oversight on my part when I introduced them.

Outputting these tags before the <text> tag would be simple to do, but would technically be a breaking change to the export format. I see no good way to make this backwards compatible, aside from outputting these tags twice.

As a stop gap, model and format could be included as attributes of the <text> tag. This is however rather inconsistent with the rest of the format, and would also need to be included in a new version of the XSD.


Version: unspecified
Severity: normal
URL: https://www.mediawiki.org/xml/export-0.9.xsd
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=72361

Details

Reference
bz72417

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:43 AM
bzimport set Reference to bz72417.
bzimport added a subscriber: Unknown Object (MLST).
daniel created this task.Oct 23 2014, 9:02 AM

Point in case: bug 72361 would be a lot easier to fix if model and format would appear before text in the xml event stream.

Do Special:Import and importDump.php care about the order of XML tags? I hope not, but they don't always behave logically. :)

Change 168583 had a related patch set uploaded by Daniel Kinzler:
Move <model> and <format> tags in XML dumps.

https://gerrit.wikimedia.org/r/168583

Would this also break `mwxml2sql'?

Change 168583 merged by jenkins-bot:
Change position of <model> and <format> tags in XML dumps.

https://gerrit.wikimedia.org/r/168583