Page MenuHomePhabricator

wikitech-static sync failing
Closed, ResolvedPublic

Description

wikitech-static is failing to get updates properly from wikitech. This is due to an xml-parsing issue -- wikitech/silver exports a dangling <comment> tag which causes the import on wikitech-static to error out:

<page>
  <title>File:Gerrit branch permissions wrong.png</title>
  <ns>6</ns>
  <id>830</id>
  <revision>
    <id>2636</id>
    <timestamp>2012-03-14T15:23:20Z</timestamp>
    <contributor>
      <username>Saper</username>
      <id>135</id>
    </contributor>
    <comment>Shouldn't happen?</comment>
    <model>wikitext</model>
    <format>text/x-wiki</format>
    <text xml:space="preserve" bytes="17">Shouldn't happen?</text>
    <sha1>hnt7uqusr9t3cey0jc2wfv14embrlhp</sha1>
  </revision>
  <upload>
    <timestamp>2012-03-14T15:23:20Z</timestamp>
    <contributor>
      <username>Saper</username>
      <id>135</id>
    </contributor>
    <comment>         <-------------------------------------------------------------------------------------------------- BROKEN!
    <filename>Gerrit_branch_permissions_wrong.png</filename>
    <src>https://wikitech.wikimedia.org/w/images/6/6b/Gerrit_branch_permissions_wrong.png</src>
    <size>37851</size>
    <sha1base36>ko3uiqjdnppvhv640fe7m7swa50l4cv</sha1base36>
    <rel>6/6b/Gerrit_branch_permissions_wrong.png</rel>
  </upload>
</page>

I don't know who is to blame for this. I could probably monkey with that uploaded file to solve the problem, but first we should understand what MW is doing.

Event Timeline

That file is produced on silver via this command:

/usr/local/bin/mwscript maintenance/dumpBackup.php labswiki --current --uploads
Reedy added a subscriber: Reedy.

We seem to have two issues here...

Why is MW emitting invalid XML?

Also, this is likely a "dupe" (or at least, a parallel bug just exposed in a different area) of T175444: File History: Comments are not displayed all versions, as you can see at https://wikitech.wikimedia.org/wiki/File:Gerrit_branch_permissions_wrong.png there is no comment either

@Andrew Is it only Files that are generating broken xml?

@Reedy I'm on holiday and so only got as far as seeing that that one use-case produces the problem. I don't know immediately how to find all the mismatches, although there are quite a few:

root@silver:/a/backup/public/foo# grep "<comment>" labswiki-20170917.xml  | wc
  10924   95531  982461
root@silver:/a/backup/public/foo# grep "</comment>" labswiki-20170917.xml  | wc
  10579   95168  976644

To see for yourself, I've left the above file on silver for now.

So the offending code is https://github.com/wikimedia/mediawiki/blob/master/includes/export/XmlDumpWriter.php#L406

Pass it '' and it's fine, pass it null and it breaks like shown.

> var_dump( Xml::elementClean( 'comment', [], null ) );
/var/www/wiki/mediawiki/core/maintenance/eval.php(78) : eval()'d code:1:
string(9) "<comment>"

> var_dump( Xml::elementClean( 'comment', [], '' ) );
/var/www/wiki/mediawiki/core/maintenance/eval.php(78) : eval()'d code:1:
string(11) "<comment />"

In other similar code, we pass it through strvale.g. https://github.com/wikimedia/mediawiki/blob/master/includes/export/XmlDumpWriter.php#L224

I guess the assumption is that $file->getDescription() shouldn't return null (the db is NOT NULL), but because of T175444 it seemingly is.

Gonna add a patch to strval that call, to prevent this invalid xml; also adds a bit more defensive-ness of the code, but doesn't fix the underlying issue

Change 378642 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Run strval() over the File description

https://gerrit.wikimedia.org/r/378642

Change 378642 merged by jenkins-bot:
[mediawiki/core@master] Run strval() over the File description

https://gerrit.wikimedia.org/r/378642

Reedy triaged this task as Medium priority.Sep 18 2017, 4:12 PM
Reedy removed a project: Patch-For-Review.

So this should be fixed now...

Both with the strval() change, but also the underlying issue being fixed in production

The strval() change could be cherry picked, but I don't know if it's actually necessary...

@Reedy definitely no need to cherry-pick if this is getting pushed out today :)

The thing that was causing it to return null itself has been fixed. The strval was just an extra safe guard

Andrew claimed this task.