Page MenuHomePhabricator

Preserve InnoDB table auto_increment on restart
Closed, ResolvedPublic

Description

With InnoDB, restarting MySQL in certain situations (e.g. right after the latest page and/or revision is deleted), can result in the a page_id or rev_id (not sure if this is problematic for additional tables) being handed out by auto_increment, even though it was previously handed out:

See http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html#innodb-auto-increment-initialization for why this happens:

To initialize an auto-increment counter after a server restart, InnoDB executes the equivalent of the following statement on the first insert into a table containing an AUTO_INCREMENT column.

SELECT MAX(ai_col) FROM table_name FOR UPDATE;

In other words, it forgets the highest auto_increment ever handed out, and only picks the highest currently there (then later adds 1).

To reproduce, without any other edits:

  1. Create page
  2. Delete page
  3. Restart MySQL server
  4. Create page
  5. Delete page

(I think it could also occur for just rev_id if you deleted the most recently-edited page, but not the most recently-created).

The latest rows in archive will have duplicates for both ar_page_id and ar_rev_id.

This causes serious problems:

Also, RevisionDelete will show two rows with the same rev_id when you use a URL like:

http://dev.wiki.local.wmftest.net:8080/w/index.php?title=Special:RevisionDelete&type=revision&ids[4767]=1&target=Page

This is the URL form linked from Special:Undelete.

+-------+--------------+----------------+---------------------------------------------------+------------+-----------+--------------+-----------------------+
| ar_id | ar_namespace | ar_timestamp   | ar_title                                          | ar_page_id | ar_rev_id | ar_user_text | ar_comment            |
+-------+--------------+----------------+---------------------------------------------------+------------+-----------+--------------+-----------------------+
|    81 |            0 | 20160520145527 | Testing_duplicate_page_ID_after_restart_unrelated |       1879 |      4767 | Admin        | Created page with "b" |
|    80 |            0 | 20160520145417 | Testing_duplicate_page_ID_before_restart          |       1879 |      4767 | Admin        | Created page with "a" |
+-------+--------------+----------------+---------------------------------------------------+------------+-----------+--------------+-----------------------+

This is unlikely to occur in production, but could also be prevented by:

  1. Before restart, query Auto_increment from affected tables (using SHOW TABLE STATUS) and store it.
  2. After restart, reset it to that value before bringing the database back online.

Event Timeline

-1 disagreeing with the solution.

This can happen also on master failover-which your solution will not protect against. The right fix is not a complex server-side fix, but to not delete rows and move them around to a different table to start with, specially when there are references to it still around. Marking them as deleted and censor (if needed for privacy reasons), but never delete rows. This prevents existing replication issues on deletion and undeletion of items.

I am not *against* implementing this, but I want a more general solution.

So my proposal is:

  1. document this limitation on mediawiki.org and recommend "best practices"
  2. try to go to a model in core that follow these practices

This was directly related to T112637, although this issue was not specifically mentioned there, and requires specific attention.

Yeah, in principle I support fixing it in core as described.

Your proposed solution is similar to RevisionDelete (which is already in core), but my understanding is that RevisionDelete can black out various fields, but can't entirely hide the revision from history.

So I don't think it can currently handle normal deletion use cases (that page is non-notable, let's delete it, page appears to not exist and have no history). However, it could be enhanced to allow this.

This would be a major change though.

This would be a major change though

I know, but I would still push for it. For the time being this is a non-issue (aka low) because dbs are never restarted- only failovered, making this not unlikely, but I would think impossible unless there is a full-outage.

  1. document this limitation on mediawiki.org and recommend "best practices"

Filed as T136045: Document best practices to avoid page_id and rev_id collisions. I'm not sure what these best practices are (failover instead of restarting DB?).

-1 disagreeing with the solution.

This can happen also on master failover-which your solution will not protect against. The right fix is not a complex server-side fix, but to not delete rows and move them around to a different table to start with, specially when there are references to it still around. Marking them as deleted and censor (if needed for privacy reasons), but never delete rows. This prevents existing replication issues on deletion and undeletion of items.

I am not *against* implementing this, but I want a more general solution.

So my proposal is:

  1. document this limitation on mediawiki.org and recommend "best practices"
  2. try to go to a model in core that follow these practices

This was directly related to T112637, although this issue was not specifically mentioned there, and requires specific attention.

This is essentially just T20493.

GTrang claimed this task.

This is more of a MySQL bug than a MediaWiki bug. Anyway, this bug was actually originally reported in 2003 as Bug #199. And 14 years later, it has been fixed in MySQL 8.0 with WL#6204. So, I'm closing this as resolved.