Page MenuHomePhabricator

wikitech-static sync failing
Closed, ResolvedPublic

Description

wikitech-static is failing to get updates properly from wikitech. This is due to an xml-parsing issue -- wikitech/silver exports a dangling <comment> tag which causes the import on wikitech-static to error out:

<page>
  <title>File:Gerrit branch permissions wrong.png</title>
  <ns>6</ns>
  <id>830</id>
  <revision>
    <id>2636</id>
    <timestamp>2012-03-14T15:23:20Z</timestamp>
    <contributor>
      <username>Saper</username>
      <id>135</id>
    </contributor>
    <comment>Shouldn't happen?</comment>
    <model>wikitext</model>
    <format>text/x-wiki</format>
    <text xml:space="preserve" bytes="17">Shouldn't happen?</text>
    <sha1>hnt7uqusr9t3cey0jc2wfv14embrlhp</sha1>
  </revision>
  <upload>
    <timestamp>2012-03-14T15:23:20Z</timestamp>
    <contributor>
      <username>Saper</username>
      <id>135</id>
    </contributor>
    <comment>         <-------------------------------------------------------------------------------------------------- BROKEN!
    <filename>Gerrit_branch_permissions_wrong.png</filename>
    <src>https://wikitech.wikimedia.org/w/images/6/6b/Gerrit_branch_permissions_wrong.png</src>
    <size>37851</size>
    <sha1base36>ko3uiqjdnppvhv640fe7m7swa50l4cv</sha1base36>
    <rel>6/6b/Gerrit_branch_permissions_wrong.png</rel>
  </upload>
</page>

I don't know who is to blame for this. I could probably monkey with that uploaded file to solve the problem, but first we should understand what MW is doing.

Event Timeline

Andrew created this task.Sep 17 2017, 5:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2017, 5:41 PM

That file is produced on silver via this command:

/usr/local/bin/mwscript maintenance/dumpBackup.php labswiki --current --uploads
Reedy added a subscriber: Anomie.EditedSep 17 2017, 6:47 PM
Reedy added a subscriber: Reedy.

We seem to have two issues here...

Why is MW emitting invalid XML?

Also, this is likely a "dupe" (or at least, a parallel bug just exposed in a different area) of T175444: File History: Comments are not displayed all versions, as you can see at https://wikitech.wikimedia.org/wiki/File:Gerrit_branch_permissions_wrong.png there is no comment either

Reedy added a comment.Sep 17 2017, 7:03 PM

@Andrew Is it only Files that are generating broken xml?

@Reedy I'm on holiday and so only got as far as seeing that that one use-case produces the problem. I don't know immediately how to find all the mismatches, although there are quite a few:

root@silver:/a/backup/public/foo# grep "<comment>" labswiki-20170917.xml  | wc
  10924   95531  982461
root@silver:/a/backup/public/foo# grep "</comment>" labswiki-20170917.xml  | wc
  10579   95168  976644

To see for yourself, I've left the above file on silver for now.

So the offending code is https://github.com/wikimedia/mediawiki/blob/master/includes/export/XmlDumpWriter.php#L406

Pass it '' and it's fine, pass it null and it breaks like shown.

> var_dump( Xml::elementClean( 'comment', [], null ) );
/var/www/wiki/mediawiki/core/maintenance/eval.php(78) : eval()'d code:1:
string(9) "<comment>"

> var_dump( Xml::elementClean( 'comment', [], '' ) );
/var/www/wiki/mediawiki/core/maintenance/eval.php(78) : eval()'d code:1:
string(11) "<comment />"

In other similar code, we pass it through strvale.g. https://github.com/wikimedia/mediawiki/blob/master/includes/export/XmlDumpWriter.php#L224

I guess the assumption is that $file->getDescription() shouldn't return null (the db is NOT NULL), but because of T175444 it seemingly is.

Gonna add a patch to strval that call, to prevent this invalid xml; also adds a bit more defensive-ness of the code, but doesn't fix the underlying issue

Change 378642 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Run strval() over the File description

https://gerrit.wikimedia.org/r/378642

Change 378642 merged by jenkins-bot:
[mediawiki/core@master] Run strval() over the File description

https://gerrit.wikimedia.org/r/378642

Reedy triaged this task as Normal priority.Sep 18 2017, 4:12 PM
Reedy removed a project: Patch-For-Review.

So this should be fixed now...

Both with the strval() change, but also the underlying issue being fixed in production

The strval() change could be cherry picked, but I don't know if it's actually necessary...

@Reedy definitely no need to cherry-pick if this is getting pushed out today :)

Reedy added a comment.Sep 19 2017, 3:47 PM

The thing that was causing it to return null itself has been fixed. The strval was just an extra safe guard

Andrew closed this task as Resolved.Oct 5 2017, 3:55 PM
Andrew claimed this task.