Page MenuHomePhabricator

Failed restore and data loss on ar_rev_id collisions
Closed, ResolvedPublic

Description

If there are duplicate ar_rev_id (see T135851: Preserve InnoDB table auto_increment on restart), attempting to restore the second one will result in an error and the loss of the revision.

To reproduce:

  1. Create a page
  2. Delete it
  3. Restart MySQL server
  4. Create a page
  5. Delete it.

You will see a page with only "Error undeleting page".

The revision will be lost entirely, and if it was a single-revision page, you end up with inconsistent state, "Notice: Page <PAGE> exists but has no (visible) revisions!"

It should at least stay in the archive table.

Event Timeline

Change 290293 had a related patch set uploaded (by Mattflaschen):
Better handle already-used rev_id when restoring

https://gerrit.wikimedia.org/r/290293

A couple exact test cases I've been using to test this:

Two separate pages with one revision each
   * Restart MySQL ()
   * Create one page - <URL>
   * Delete it ()
   * Restart MySQL ()
   * Create a separate page - <URL>
   * Delete it ()
   * Restore first page ()
   * Visit first page ()
   * Restore second page [it should show 'Some or all of the undeletion failed: One revision could not be restored, because its rev_id was already in use.'] ()
   * Visit second page [It should still be fully deleted] ()

One page with multiple deleted revisions.
   * Restart MySQL ()
   * Create one page - <URL>
   * Delete it ()
   * Restart MySQL ()
   * Create same page ()
   * Delete it ()
   * Restart MySQL ()
   * Create same page ()
   * Delete it ()
   * Restore all revisions [it should show 'Some or all of the undeletion failed: 2 revisions could not be restored, because their rev_id was already in use.'] ()
   * Visit page and check that one was restored [The first revision of the page should be restored, the others should still be deleted, not vanished] ()

Change 290293 merged by jenkins-bot:
Better handle already-used rev_id when restoring

https://gerrit.wikimedia.org/r/290293

On vagrant checked the scenario of deleting pages with not revisions.

  • page 'Mavetuna' was created deleted
  • mysql restarted
  • 'Mavetuna1' was created and deleted - no errors were displayed

@Mattflaschen-WMF: Interesting that ar_rev_id: 17 is the same for both pages.
The results of the test described above:

    • 'Mavetuna' was successfully restored via Special:Log.
  • 'Mavetuna1' cannot be restored; the following error page is displayed:
Error undeleting page
Some or all of the undeletion failed: One revision could not be restored, because its rev_id was already in use.
root@localhost:[wiki]> select * from archive\G
*************************** 1. row ***************************
            ar_id: 1
     ar_namespace: 0
         ar_title: Mavetuna
          ar_text: 
       ar_comment: Created page with "Creating Mavetuna by [[User:Admin]] on Aug 18/2016"
          ar_user: 1
     ar_user_text: Admin
     ar_timestamp: 20160818233707
    ar_minor_edit: 0
         ar_flags: 
        ar_rev_id: 17
       ar_text_id: 17
       ar_deleted: 0
           ar_len: 50
       ar_page_id: 16
     ar_parent_id: 0
          ar_sha1: r7we9d4p10aj0cq0149za2bmrsm40p3
 ar_content_model: NULL
ar_content_format: NULL
*************************** 2. row ***************************
            ar_id: 2
     ar_namespace: 0
         ar_title: Mavetuna1
          ar_text: 
       ar_comment: Created page with "create to be immediately deleted"
          ar_user: 1
     ar_user_text: Admin
     ar_timestamp: 20160818234425
    ar_minor_edit: 0
         ar_flags: 
        ar_rev_id: 17
       ar_text_id: 18
       ar_deleted: 0
           ar_len: 32
       ar_page_id: 16
     ar_parent_id: 0
          ar_sha1: dxc44wahedkyg73y2ocuhqs70v6hhj8
 ar_content_model: NULL
ar_content_format: NULL
2 rows in set (0.00 sec)

It looks like that MySQL query is from before undeleting the first page. If you repeat after the first restore, there should be only Mavetuna1.

That is what is currently supposed to happen when you do the 2nd restore.

If you test the same thing with bbf1102f9f2f9220a7ff6513976d2970722e84b1 (before my fix), the 2nd restore will mysteriously fail, and the Mavetuna1 row will be completely lost (it will be in neither archive nor revision)

Checked the second scenario - One page with multiple deleted revisions

The result is somewhat different from what you described:

Screen Shot 2016-08-19 at 2.10.37 PM.png (672×1 px, 176 KB)

  • all versions of the page are restored.

I did mysql restart as following (the same command as for the first scenario testing):

vagrant@mediawiki-vagrant:~$ sudo service mysql restart
mysql stop/waiting
mysql start/running, process 9997

Re-checked both scenarios - the results are as @Mattflaschen-WMF described.

Just to document the second scenario from the db point:

(1)

One page with multiple deleted revisions.

  • Restart MySQL ()
  • Create one page - <URL>

Mavetuna13 wikitext page was created
(2)

  • Delete it ()
  • Restart MySQL ()

Mavetuna13 is present in 'archive' table

(19:20) root@localhost:[wiki]> select ar_title, ar_rev_id from archive;
+------------+-----------+
| ar_title   | ar_rev_id |
+------------+-----------+
| Mavetuna1  |        17 |
| Mavetuna2  |        18 |
| Mavetuna8  |        27 |
| Mavetuna9  |        28 |
| Mavetuna13 |        30 |
+------------+-----------+
5 rows in set (0.01 sec)

(3)

* Create same page ()
  * Delete it ()
  * Restart MySQL ()

Mavetuna13 is present in archive with the same ar_rev_id

(19:24) root@localhost:[wiki]> select ar_title, ar_rev_id from archive;
+------------+-----------+
| ar_title   | ar_rev_id |
+------------+-----------+
| Mavetuna1  |        17 |
| Mavetuna2  |        18 |
| Mavetuna8  |        27 |
| Mavetuna9  |        28 |
| Mavetuna13 |        30 |
| Mavetuna13 |        30 |
+------------+-----------+
6 rows in set (0.00 sec)

(4)
> * Create same page ()

  • Delete it ()

After the third deletion, Mavetuna13 is present in 'archive' table with three idential ar_rev_id:

(19:27) root@localhost:[wiki]> select ar_title, ar_rev_id from archive;
+------------+-----------+
| ar_title   | ar_rev_id |
+------------+-----------+
| Mavetuna1  |        17 |
| Mavetuna2  |        18 |
| Mavetuna8  |        27 |
| Mavetuna9  |        28 |
| Mavetuna13 |        30 |
| Mavetuna13 |        30 |
| Mavetuna13 |        30 |
+------------+-----------+
7 rows in set (0.00 sec)

(5)

  • Restore all revisions [it should show 'Some or all of the undeletion failed: 2 revisions could not be restored, because their rev_id was already in use.'] ()
  • Visit page and check that one was restored [The first revision of the page should be restored, the others should still be deleted, not vanished] ()
  1. Restored all revisions of Mavetuna13 via Special:Undelete.
  2. As expected, the following is displayed:
Mavetuna13 has been restored

Consult the deletion log for a record of recent deletions and restorations.

Some or all of the undeletion failed: 2 revisions could not be restored, because their rev_id was already in use.
  1. Visited the page Mavetuna13 - the first version of the page is displayed.
  2. 'archive' table has two records of two latest revisions of Mavetuna13 page.
(19:30) root@localhost:[wiki]> select ar_title, ar_rev_id from archive;
+------------+-----------+
| ar_title   | ar_rev_id |
+------------+-----------+
| Mavetuna1  |        17 |
| Mavetuna2  |        18 |
| Mavetuna8  |        27 |
| Mavetuna9  |        28 |
| Mavetuna13 |        30 |
| Mavetuna13 |        30 |
+------------+-----------+
6 rows in set (0.00 sec)