Page MenuHomePhabricator

Multiple pages with no revisions
Open, LowPublic

Description

@Malafaya reported on IRC that the following page: https://mg.wiktionary.org/wiki/franciu?uselang=en crashes their bot. It looks like the page exists (https://mg.wiktionary.org/w/index.php?title=franciu&action=info&uselang=en), but has no revisions. No idea what happened here, but if the revisions existed and then disappeared, that looks like a data loss bug.

Reported by @DannyS712 at T220281
Currently, there are 16 pages on enwiki that exist in the page table but have no entries in the revision table, and can neither be created nor deleted on wikipedia. See https://quarry.wmflabs.org/query/34936 and https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(miscellaneous)#Pages_that_don't_exist for more. Can they be fixed/deleted/restored?


Possible causes of this:

  • T135852: Failed restore and data loss on ar_rev_id collisions (if there are no other revisions after the restore fails).
  • T220281 Looks like it's left over from an incident back in 2012-06-11. The external store master went down and it looks like between about 23:38:47 UTC and 23:52:37 UTC revisions couldn't be created because the revision content couldn't be saved. Apparently attempted page creations during that time created the page row despite failing to create the initial revision. @Anomie

Event Timeline

matmarex created this task.Sep 11 2015, 4:12 PM
matmarex raised the priority of this task from to High.
matmarex updated the task description. (Show Details)
matmarex added subscribers: matmarex, Malafaya.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptSep 11 2015, 4:12 PM

There are other pages with this issue

mysql> select page_namespace, page_title from page left join revision on (rev_page = page_id) where rev_page is null;
+----------------+----------------+
| page_namespace | page_title     |
+----------------+----------------+
|              0 | ficțiune       |
|              0 | flacără        |
|              0 | franciu        |
|              0 | frunte         |
|              0 | heliu          |
|              0 | nandrakarakany |
+----------------+----------------+
6 rows in set (21.32 sec)
mysql> select page_title, page_id, page_touched from page where page_namespace = 0 and page_title in ('ficțiune', 'flacără', 'franciu', 'frunte', 'heliu', 'nandrakarakany');
+----------------+---------+----------------+
| page_title     | page_id | page_touched   |
+----------------+---------+----------------+
| ficțiune       | 2364109 | 20120611234131 |
| flacără        | 2364110 | 20120611234239 |
| franciu        | 2364111 | 20150911160811 |
| frunte         | 2364112 | 20120611234556 |
| heliu          | 2364113 | 20120611235110 |
| nandrakarakany | 1303665 | 20110708220002 |
+----------------+---------+----------------+
6 rows in set (0.00 sec)
Krenair renamed this task from Page with no revisions: https://mg.wiktionary.org/wiki/franciu to Multiple pages with no revisions.Sep 11 2015, 7:14 PM
Krenair set Security to None.

franciu records says page was last touched at 2015-09-11 16:08:11. dbstore1001, that runs with a 24 hour delay doesn't have a revision for that page, and the results of the query are:

MariaDB DBSTORE localhost mgwiktionary > select page_title, page_id, page_touched from page where page_namespace = 0 and page_title in ('ficțiune', 'flacără', 'franciu', 'frunte', 'heliu', 'nandrakarakany');
+----------------+---------+----------------+
| page_title     | page_id | page_touched   |
+----------------+---------+----------------+
| ficțiune       | 2364109 | 20120611234131 |
| flacără        | 2364110 | 20120611234239 |
| franciu        | 2364111 | 20120611234447 |
| frunte         | 2364112 | 20120611234556 |
| heliu          | 2364113 | 20120611235110 |
| nandrakarakany | 1303665 | 20110708220002 |
+----------------+---------+----------------+

My opinion on this is that there could be data loss, but all of them in 2012 of before, it just happens that the page was "touched" recently. This makes this issue less of an imminent problem (discards recent data loss, security or bug issue). Although this gives me some ideas to detect this cases faster (having some running data integrity checks regularly).

I would like to know if touching a page that has no revisions is a bug or it can happen?

We do no have backups from 2012, although maybe someone has a dump...

We do no have backups from 2012, although maybe someone has a dump...

T26675: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" is also waiting for potential restoration from dumps.

jcrespo added a comment.EditedSep 11 2015, 8:02 PM

I checked the logs and a cache invalidation by @Malafaya created the page_touched, I suppose, thinking that no page content was a caching issue, and not the underlying missing record(s):

use `mgwiktionary`/*!*/;
SET TIMESTAMP=1441987691/*!*/;
UPDATE /* Title::invalidateCache */  `page` SET page_touched = '20150911160811' WHERE page_id = '2364111' AND (page_touched < '20150911160811')

So no recent bug. At this point I would thank the user for spotting it and focus on detecting all instances of this wiki-wide and a strategy for recovering them from the dumps (this is from my side).

From the developer point of view, you should check if the current status creates application instability at read API and editing side, and either insert a fake value for now or handle the exception in a non-fatal way.

@jcrespo, indeed I did an ?action=purge to see if it would solve it, without sucess.

Krenair added a comment.EditedSep 12 2015, 3:59 AM

On enwiki:

mysql> select page_namespace, page_title from page left join revision on (rev_page = page_id) where rev_page is null;
+----------------+------------------------------------------------------------------------+
| page_namespace | page_title                                                             |
+----------------+------------------------------------------------------------------------+
|              3 | 1r3gr37n0n                                                             |
|              3 | 68.148.238.76                                                          |
|              3 | 71.129.57.41                                                           |
|              3 | HOTPOCKETSG                                                            |
|              3 | Heelo1                                                                 |
|              3 | Jaeh0317                                                               |
|              3 | Lilyonthevalley                                                        |
|              3 | Mightym53821                                                           |
|              3 | Ttylxox1000                                                            |
|              3 | Vd437                                                                  |
|              3 | Vishal.gkamath                                                         |
|              4 | Articles_for_deletion/2009_Sulu_kidnapping_crisis                      |
|              4 | Articles_for_deletion/2009_Sulu_kidnapping_crisis_(2nd_nomination)     |
|              4 | Articles_for_deletion/2009_Sulu_kidnapping_crisis_(3rd_nomination)     |
|              4 | Articles_for_deletion/333:_A_Bibliography_of_the_Science-Fantasy_Novel |
|              4 | Articles_for_deletion/3_Libras_(song)                                  |
+----------------+------------------------------------------------------------------------+
16 rows in set (8 hours 30 min 1.47 sec)

User talk (namespace 3) entries had page_touched values from various dates going back to 2008, Wikipedia (namespace 4) entries had page_touched values all on 2012-06-11, 39-43 minutes past 23.

jcrespo moved this task from Triage to Backlog on the DBA board.Sep 15 2015, 11:02 AM
Gehel added a subscriber: Gehel.Jul 19 2016, 2:44 PM

This seems to have been stalled for a long time. Should we close this? Or is there something we can still do at this point? My understanding is that the loss of revision is old enough that the initial issue has probably been fixed and that there is no way of recovering this content. Is there anything else we should do?

Should we close this?

Why, this is on the backlog?

Is there anything else we should do?

Maybe help solving this instead of blindly closing it?

The content can potentially be recovered from old dumps. We should also check for any revisions that are not connected to existing pages, perhaps they're still there.

Do we have a list of known causes for this?

Started a list in the description of this (Probably most of these were not caused by that bug, I just want to have a list).

Does anyone plan to fix those pages? Or is this task rather low priority?

My opinion on this is that there could be data loss, but all of them in 2012 of before, it just happens that the page was "touched" recently. This makes this issue less of an imminent problem (discards recent data loss, security or bug issue).

This is low priority, see comment above (old issues), but I intend to work on this at some point (it would be on scope of the more important T104459), either to recover the lost data or to add null edits. Low priority for now because the issue seems old and no longer active (probably caused due to a long-time fixed bug or maintenance).

Gehel removed a subscriber: Gehel.Jun 20 2017, 8:07 AM
RobH lowered the priority of this task from High to Low.May 3 2018, 4:52 PM
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: DannyS712.
Marostegui added a subscriber: Anomie.