Desciclopédia [tm] (and a few other Uncyclopedias [tm] in the same, non-WMF series) recently upgraded from MW 1.31 to MW 1.35 as the old MediaWiki versions were breaking badly under the now-current PHP 7.4. While no longer in the top-16000 (or even the top 200000) in Alexa rank, DP is still fairly large and has a very active editor community. In the second half of the twenty hours (or so) which it took for a maintenance/update.php run to complete, there would have been more than a hundred edits made by individual users. Many of these edits leave the database in an inconsistent state where the most recent revision of an article, made while the update script was running, is being displayed in regular page view as "There is no content on this page". The revision history shows every revision except the ones made during the maintenance/update.php run (all the text from the old versions is there and viewable individually) but any attempt to edit the page fails with an edit conflict (MW sees the broken revision as conflicting with any new edit anyone - including the administrators - attempts to make).
The text of these "lost" revisions still exists and can be manually recovered at the database level. For instance, if [[uncyclopedia:pt:Pikachu Libre]] is one of the affected articles, going through the individual database tables gives:
SELECT * FROM `page` WHERE `page_namespace` = 0 AND `page_title` = 'Pikachu_Libre'; --- finds page_id = 605765 SELECT * FROM `revision` WHERE `rev_page` = 605765 ORDER BY `revision`.`rev_id` DESC; --- finds a half-dozen revisions, of which the most recent (rev_id=3957083, rev_page=605765,rev_comment_id=0,rev_actor=0,rev_timestamp=20201118202636) is not viewable from within MW 1.35 and which (predictably) falls within the timeframe affected by the update.php run, which was done on 2020-11-18 --- Now take a gamble at the `slots` table, betting everything on 3957083 SELECT * FROM `slots` WHERE `slot_revision_id` = 3957083 AND `slot_origin` = 3957083; --- So what do I win? A slot_content_id of 3501628, which gives: SELECT * FROM `content` WHERE `content_id` = 3501628; --- gives content_address = tt:3921557, where SELECT old_text FROM `text` WHERE `old_id` = 3921557; --- returns the text of this "lost" revision, but minus the edit comment and the author's name; at this point the sysadmin is promptly arrested for violating CC-BY-NC-SA's attribution requirement, a copyright infringement and a serious criminal offence, and promptly sent up the river to prison
OK, great. That still doesn't put the article back into an editable state, nor does it restore the "missing" revision. It is possible to get the article to be editable again by forcing a valid, previous revision.rev_id from the same page into the page table:
SELECT * FROM `revision` WHERE `rev_page` = 605765 AND `rev_timestamp` < '20201118000000' ORDER BY `revision`.`rev_id` DESC; --- found rev_id=3870070 as the most recent before the day of the "upgrade", so force that value into the page table as: UPDATE `page` SET `page_latest` = '3870070' WHERE `page`.`page_id` = 605765; --- and somehow the page becomes editable, but without the revision made during the update.php run. Now multiply this across hundreds of edits over several dozen pages, and it becomes clear that - while the problem is not irremediable - this is rapidly becoming a disaster.
Obviously, a few issues here:
- Why is the update.php script leaving articles in an uneditable state if they were edited while the update was running on the server? I presume this is happening during all of the huge schema changes which replaced (page, revision, text) with (page, revision, actor, slot, content, text) but it gets ugly on a busy production site.
- Why is there no automated maintenance script to fix these? At a minimum, there should be some means to go through the page/revision tables, find any entries for which page.page_latest points to something which isn't usable content (for whatever reason) and repoint page.page_latest to the latest usable revision so that the affected pages can be edited again.
- What happened to revision.rev_actor and revision.rev_commentid? The documentation at [[mw:Manual:Revision_table]] claims that these fields were used in MW 1.35 to replace the old rev_comment, rev_user and rev_user_text, but on the actual database (with the MW 1.35 maintenance/update.php applied) they're both '0' for every record - for every page or revision created either before, during or after the conversion. The data has been moved somewhere else, but where, and why doesn't the documentation on https://www.mediawiki.org/wiki/Manual:Revision_table reflect this?
I've been manually setting page.page_latest back to sane values at the database level, one page at a time, as I have a brazillion very dissatisfied users at this point.
This process needs to be automated, and the bug that allowed this situation to occur needs to be fixed. Any suggestions?