Page MenuHomePhabricator

early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

The bad revisions appear to be duplicated. They also appear to have missing or invalid timestamps, which are being replaced with the current time in both the API and the UI.

What should have happened instead?:

I suppose either the revisions in question should be deleted or they should have correct timestamps if those are recorded somewhere. I haven't verified that the revisions are duplicated across all the wikis.

Other information (browser name/version, screenshots, etc.):

Works in all browsers. This is true in at least the following Wikipedias:

  • Tok Pisin (tpi)
  • Aromanian (roa-rup)
  • Kashubian (csb)
  • Sardinian (sc)
  • Ido (io)
  • Simple English (simple)
  • Minnan (zh-min-nan)

It might also be true on other WMF wikis but we've only looked at Wikipedias. If you want find others that are affected you can make an API request like:

https://simple.wikipedia.org/w/api.php?action=query&list=allrevisions&arvprop=ids|timestamp&arvdir=newer&arvlimit=1

Credit: This was discovered by Zarine Kharazian (User:Zarinek) who I am working on a project with. I'm reporting the bug because I have a Phabricator account already.

Event Timeline

mako renamed this task from early duplicated revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API to duplicated early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API.Tue, Jul 9, 11:28 PM

Yeah, it sure looks like something invalid in the database. Check out the continuation string which is all �: https://simple.wikipedia.org/w/api.php?action=query&list=allrevisions&arvprop=ids|timestamp&arvdir=newer&arvlimit=1

[Removing MediaWiki-Engineering team project as it's up to each team what they have on their workboard]

rev_timestamp is empty in the database:

wikiadmin2023@10.64.16.46(simplewiki)> select * from revision where rev_id = 22329;
+--------+----------+----------------+-----------+----------------+----------------+-------------+---------+---------------+---------------------------------+
| rev_id | rev_page | rev_comment_id | rev_actor | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        |
+--------+----------+----------------+-----------+----------------+----------------+-------------+---------+---------------+---------------------------------+
|  22329 |     7870 |         113519 |     28592 |                |              0 |           0 |     205 |             0 | 3vcdufxmrq68yxp7cmap3cr50qvdqm5 |
+--------+----------+----------------+-----------+----------------+----------------+-------------+---------+---------------+---------------------------------+
1 row in set (0.001 sec)

I assume due to bugs that timestamp has been lost. Maybe it could be retrieved from a very early dump or some other means (it also might be that it was never set which I don't know any way to find the timestamp)

I think there are actually two distinct issues here:

  1. An issue with missing/invalid data in the database: Timestamps should not be missing from revisions.
  2. A bug in MediaWiki: Missing timestamps should not be reported with the current date by the API or UI.

I guess that #1 probably needs to be considered on a case-by-case basis. That said, there don't appear to be many cases. In the case of those two edits to Simple English, it sure looks like they are just duplicates. The first two edits *with* timestamps have the same revision text and the same editor, at least, although I suppose it could be a revert, restore interaction.

Maybe it could be retrieved from a very early dump or some other means

We have occasionally backfilled missing information with reasonable guesses. It would be ok to just assign a rev_timestamp slightly lower than the one of the next rev_id, for example. (Skipping imported revisions, as they can be in any order.)

The edits should just be deleted if they're duplicates. Probably by performing a selective deletion on-wiki.

There's something else funky with that page - the first 11 edits all have consecutive revids suggesting that they were imported from somewhere. And there's an edit in 2004 by @tstarling saying " fixed, ignore previous history entry " implying something was manually poked at.

The issue on simplewiki is different from those on other wikis. On the other wikis the edits with bad timestamps are the first edits ever made to the wiki, and either to the main page or to legacy wikitext log pages like "Project:Deletion log" and by unknown user. On simplewiki they're to a real page by real humans.

Actually the edits have consecutive revids because they were deleted/undeleted in 2004 when doing so didn't preserve revids. Still means Nemo's idea above won't work though.

mako renamed this task from duplicated early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API to early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API.Thu, Jul 11, 8:28 PM