Fix incomplete "latest file revision" calculation


Fix incomplete "latest file revision" calculation

Before, the code used *exclusively* the timestamp to check which of
the revisions we got from the "imageinfo" API request is the latest.
However, this is not always correct. Look at
https://et.wikipedia.org/wiki/Fail:Image002.gif. The history of this
file contains many reverts. Because of this, the latest revision is
*older* than the revision before.

current 2005-02-19T20:41:04 […] Reverted to earlier revision
revert 2005-02-19T20:41:12 […]

Solution: Don't use the timestamp. At least not exclusively. The API
allows to query for an "archivename". If one is present, it's not the
latest file revision.

I checked and it seems this "archivename" field is the only indication
a file is in the archive. Well, that and the fact that the path contains
…/archive/…. But I don't want to do a substring comparison.

Further proof this aproach is the only correct one can be seen in
ImportableUploadRevisionImporter::import(). This also uses the
"archivename" to mark old revisions.

This patch fixes 2 closely related issues:

  • The order of the imported file revisions was wrong when later revisions that are reverts have a smaller timestamp.
  • The duplicate check was not based on the latest revision, but on the revision with the largest timestamp.

Bug: T229181
Change-Id: I31e0215d1c566c025a8a0435136afef280c4eab7


thiemowmdeAuthored on Jun 5 2020, 12:50 PM
rEFLIa3fe3d023127: Merge "Add "missing sha1" test for duplicate checker"