Page MenuHomePhabricator

Empty legacy upload/deletion/block logs, initial creations of Main Page on some wikis, and some unrelated old revisions on simplewiki, have blank timestamps, which render as the current time
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

The bad revisions appear to be duplicated. They also appear to have missing or invalid timestamps, which are being replaced with the current time in both the API and the UI.

What should have happened instead?:

I suppose either the revisions in question should be deleted or they should have correct timestamps if those are recorded somewhere. I haven't verified that the revisions are duplicated across all the wikis.

Other information (browser name/version, screenshots, etc.):

Works in all browsers. This is true in at least the following Wikipedias:

  • Tok Pisin (tpi)
  • Aromanian (roa-rup)
  • Kashubian (csb)
  • Sardinian (sc)
  • Ido (io)
  • Simple English (simple)
  • Minnan (zh-min-nan)

It might also be true on other WMF wikis but we've only looked at Wikipedias. If you want find others that are affected you can make an API request like:

https://simple.wikipedia.org/w/api.php?action=query&list=allrevisions&arvprop=ids|timestamp&arvdir=newer&arvlimit=1

Credit: This was discovered by Zarine Kharazian (User:Zarinek) who I am working on a project with. I'm reporting the bug because I have a Phabricator account already.

Event Timeline

mako renamed this task from early duplicated revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API to duplicated early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API.Jul 9 2024, 11:28 PM

Yeah, it sure looks like something invalid in the database. Check out the continuation string which is all �: https://simple.wikipedia.org/w/api.php?action=query&list=allrevisions&arvprop=ids|timestamp&arvdir=newer&arvlimit=1

[Removing MediaWiki-Engineering team project as it's up to each team what they have on their workboard]

rev_timestamp is empty in the database:

wikiadmin2023@10.64.16.46(simplewiki)> select * from revision where rev_id = 22329;
+--------+----------+----------------+-----------+----------------+----------------+-------------+---------+---------------+---------------------------------+
| rev_id | rev_page | rev_comment_id | rev_actor | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        |
+--------+----------+----------------+-----------+----------------+----------------+-------------+---------+---------------+---------------------------------+
|  22329 |     7870 |         113519 |     28592 |                |              0 |           0 |     205 |             0 | 3vcdufxmrq68yxp7cmap3cr50qvdqm5 |
+--------+----------+----------------+-----------+----------------+----------------+-------------+---------+---------------+---------------------------------+
1 row in set (0.001 sec)

I assume due to bugs that timestamp has been lost. Maybe it could be retrieved from a very early dump or some other means (it also might be that it was never set which I don't know any way to find the timestamp)

I think there are actually two distinct issues here:

  1. An issue with missing/invalid data in the database: Timestamps should not be missing from revisions.
  2. A bug in MediaWiki: Missing timestamps should not be reported with the current date by the API or UI.

I guess that #1 probably needs to be considered on a case-by-case basis. That said, there don't appear to be many cases. In the case of those two edits to Simple English, it sure looks like they are just duplicates. The first two edits *with* timestamps have the same revision text and the same editor, at least, although I suppose it could be a revert, restore interaction.

Maybe it could be retrieved from a very early dump or some other means

We have occasionally backfilled missing information with reasonable guesses. It would be ok to just assign a rev_timestamp slightly lower than the one of the next rev_id, for example. (Skipping imported revisions, as they can be in any order.)

The edits should just be deleted if they're duplicates. Probably by performing a selective deletion on-wiki.

There's something else funky with that page - the first 11 edits all have consecutive revids suggesting that they were imported from somewhere. And there's an edit in 2004 by @tstarling saying " fixed, ignore previous history entry " implying something was manually poked at.

The issue on simplewiki is different from those on other wikis. On the other wikis the edits with bad timestamps are the first edits ever made to the wiki, and either to the main page or to legacy wikitext log pages like "Project:Deletion log" and by unknown user. On simplewiki they're to a real page by real humans.

Actually the edits have consecutive revids because they were deleted/undeleted in 2004 when doing so didn't preserve revids. Still means Nemo's idea above won't work though.

mako renamed this task from duplicated early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API to early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API.Jul 11 2024, 8:28 PM
xcollazo added subscribers: xcollazo, gmodena, Milimetric, Krinkle.

I have reproed this issue on T378603, thus marked that one as a duplicate of this one.

Over there, we identified 10 wikis with this behavior:

chrwiktionary
csbwiki
foundationwiki
iowiki
roa_rupwiki
roa_rupwiktionary
scwiki
simplewiki
tpiwiki
zh_min_nanwiki

Ancient (and irrelevant but why not?) spelunking:

https://github.com/wikimedia/mediawiki/blob/38b72bb4a150b4e71199ff7991ab8c09bacc09f1/maintenance/initialdata.sql

INSERT INTO cur (cur_namespace,cur_title,cur_text,cur_restrictions)
  VALUES (4,'Upload_log','Below is a list of the most recent file uploads.\nAll times shown are server time (UTC).\n<ul>\n</ul>\n','sysop'),
  (4,'Deletion_log','Below is a list of the most recent deletions.\nAll times shown are server time (UTC).\n<ul>\n</ul>\n','sysop'),
  (0,'Main_Page','Wiki software successfully installed!',''),
  (4,'Block log', 'This is a log of user blocking and unblocking actions. Automatically 
blocked IP addresses are not be listed. See the [[Special:Ipblocklist|IP block list]] for
the list of currently operational bans and blocks.', 'sysop');

Note the last of cur_timestamp value. This means, per the table definition (https://github.com/wikimedia/mediawiki/blob/38b72bb4a150b4e71199ff7991ab8c09bacc09f1/maintenance/tables.sql#L40), that cur_timestamp was saved with the empty string.

cur_timestamp char(14) binary NOT NULL default '',

The Main Page was always affected (but see the next paragraph), however old versions of MediaWiki overwrote the cur table with the latest log entry without creating a new revision when adding to the other logs, so other pages are only affected if nothing was added to the log until December 2004 when the modern logging system was invented.

This bug was supposedly fixed in November 2003 with https://github.com/wikimedia/mediawiki/commit/3f51fd55bde86e0e6316a5c8b87593e3b465d214, however wikis created after November 2003 still show it so I guess they were slow at deploying things to the servers - I can't spelunk 22 years back in that case.

Wikis created in 2001 or early 2002 are not affected by this as https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/phpwiki/newcodebase/maintenance/convertdb.php;9999 overwrote the empty cur entries. Wikis created between 2002 and early-mid 2003 went through https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/phpwiki/newcodebase/maintenance/buildTables.inc;9999 which added the upload and deletion log pages but didn't add the Main Page.

Simplewiki, as before, is a completely different cause. I poked around in some old logs but couldn't find any information about what caused it.

Note that viewing the revisions that has this bug (except for the simplewiki revisions) produces an exception:

[c8ba9365-da66-4ea7-b5ae-7de51dbaba3a] 2025-02-25 05:59:13: Fatal exception of type "InvalidArgumentException"
Pppery renamed this task from early revisions on at least 7 wikimedia wikis are showing up with timestamps listed as the current time in both interface and API to Empty legacy upload/deletion/block logs, initial creations of Main Page on some wikis, and some unrelated old revisions on simplewiki, have blank timestamps, which render as the current time.Tue, Feb 25, 6:31 AM

That invalidArgumentException is presumably T379868 -

Note that viewing the revisions that has this bug (except for the simplewiki revisions) produces an exception:

That InvalidArgumentException is T307738: Actor name can not be empty for 0 and 3215898.