Page MenuHomePhabricator

The "archive" database table schema is inconsistent with update patch
Closed, ResolvedPublic

Description

0) Context

The database schemata of several tables (archive, image, oldimage, and revision) each declare a (user, timestamp) index.

  1. Problem

For the archive database table, there is a (user, timestamp) index that is:
o named one way in maintenance/table.sql, and
o named another way in the update patch maintenance/archives/patch-archive-user-index.sql.

Table nameIndex name - table.sqlIndex name - update patchMatch
archivear_usertext_timestampusertext_timestampNO
imageimg_usertext_timestampimg_usertext_timestampyes
oldimageoi_usertext_timestampoi_usertext_timestampyes
revisionusertext_timestampusertext_timestampyes
  1. Contents of maintenance/tables.sql

CREATE TABLE /*_*/archive (
ar_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
...
ar_content_format varbinary(64) DEFAULT NULL
) /*$wgDBTableOptions*/;

CREATE INDEX /*i*/name_title_timestamp ON /*_*/archive (ar_namespace,ar_title,ar_timestamp);
CREATE INDEX /*i*/ar_usertext_timestamp ON /*_*/archive (ar_user_text,ar_timestamp);
CREATE INDEX /*i*/ar_revid ON /*_*/archive (ar_rev_id);

  1. Contents of maintenance/archives/patch-archive-user-index.sql

ALTER TABLE /*$wgDBprefix*/archive
ADD INDEX usertext_timestamp ( ar_user_text , ar_timestamp );

  1. Consequence

When the patch is applied using maintenance/update.php, the database schema becomes:

PRIMARY KEY (ar_id),
KEY name_title_timestamp (ar_namespace,ar_title,ar_timestamp),
KEY ar_usertext_timestamp (ar_user_text,ar_timestamp),
KEY ar_revid (ar_rev_id),
KEY usertext_timestamp (ar_user_text,ar_timestamp)
) ENGINE=InnoDB DEFAULT CHARSET=binary;

which looks dubious.

  1. Technical details

GIT branch: master

Event Timeline

wpmirrordev raised the priority of this task from to Medium.
wpmirrordev updated the task description. (Show Details)
wpmirrordev subscribed.

On MySQL, it is technically possible to create the same index with a different name twice. If the question is if that is useful for something, the answer is "not at all": it wastes space, makes writes slower and potentially confuses the query planner. I will investigate what is the status for WMF production.

Note: for mysql 5.6, this would return a warning.

Some context:

rSVN45764 (rMW4124558d7b40df81) merged the MySQL and SQLite schema files. In that commit, the index on the archive table was renamed to ar_usertext_timestamp, because you cannot have two indexes with the same name, though on different tables, in the same SQLite database. This introduced the described problem.

A hack was added to fix this in rSVN45769 (rMWf058162bc556b35f). Each index name is now prefixed with an /*i*/ comment, which for all database types but SQLite, replaces the new name with the old one when the SQL script is run from the installer or from a maintenance script. See DatabaseBase::replaceVars() and DatabaseBase::indexName().

So now, the described issue can only happen when tables.sql is run outside the installer using, for example, the mysql command line client or phpMyAdmin, or when using sql.php yet redirecting standard input to the file instead of providing the filename as an argument.

I do have the duplicate ar_usertext_timestamp index in my test wiki's database, though only because I had run tables.sql using mysql.

My dev wiki only has the old non prefixed one (and it's been around quite a few versions!)

mysql> show indexes from mw_archive;
+------------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table      | Non_unique | Key_name             | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| mw_archive |          0 | PRIMARY              |            1 | ar_id        | A         |          43 |     NULL | NULL   |      | BTREE      |         |               |
| mw_archive |          1 | name_title_timestamp |            1 | ar_namespace | A         |           8 |     NULL | NULL   |      | BTREE      |         |               |
| mw_archive |          1 | name_title_timestamp |            2 | ar_title     | A         |          43 |     NULL | NULL   |      | BTREE      |         |               |
| mw_archive |          1 | name_title_timestamp |            3 | ar_timestamp | A         |          43 |     NULL | NULL   |      | BTREE      |         |               |
| mw_archive |          1 | usertext_timestamp   |            1 | ar_user_text | A         |           6 |     NULL | NULL   |      | BTREE      |         |               |
| mw_archive |          1 | usertext_timestamp   |            2 | ar_timestamp | A         |          43 |     NULL | NULL   |      | BTREE      |         |               |
| mw_archive |          1 | ar_revid             |            1 | ar_rev_id    | A         |          43 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
7 rows in set (0.01 sec)

Change 344452 had a related patch set uploaded (by Paladox):
[mediawiki/core@master] Migrate usertext_timestamp to ar_usertext_timestamp on wiki's lower then 1.28

https://gerrit.wikimedia.org/r/344452

I will investigate what is the status for WMF production.

Did you ever do this? I seem to recall this causing WMF production an issue "recently" potentially

Change 344460 had a related patch set (by Paladox) published:
[mediawiki/core@master] Drop index usertext_timestamp on the archive table

https://gerrit.wikimedia.org/r/344460

Change 344460 abandoned by Paladox:
Drop index usertext_timestamp on the archive table

Reason:
I will do it in one patch instead https://gerrit.wikimedia.org/r/#/c/344452/

https://gerrit.wikimedia.org/r/344460

jcrespo unsubscribed.

@Reedy T104756#1428651 and T104756#1428779 tells me this I did the check and this has nothing to do with production databases. I am sorry I did not expressed it clearly at the time-probably I was very busy at the time and thought the tag change was enough. I ask you for apologies for being unclear. I confirm this is still true for s1, s2, s4, s5 and s7 (the largest and busiest wikis), which I can check quickly.

@Reedy T104756#1428651 and T104756#1428779 tells me this I did the check and this has nothing to do with production databases. I am sorry I did not expressed it clearly at the time-probably I was very busy at the time and thought the tag change was enough. I ask you for apologies for being unclear. I confirm this is still true for s1, s2, s4, s5 and s7 (the largest and busiest wikis), which I can check quickly.

Thanks.

By age of the wikis, only the newest seems to have the new name. But per the end of my comment on T161252#3128004 this feels weird, and something isn't right. We create wikis from that patch.... And all but the newest have the old index name? I know we must've created 20+ wikis since... I mean, there'll have been 6 or 7 Wikimania wikis, never mind anything else...

Indeed, there must be something missing- we definitely should not rush on making production changes yet.

Indeed, there must be something missing- we definitely should not rush on making production changes yet.

I think paladox jumped the gun by quite an amount creating the DBA ticket before any CR had really happened. Granted, this is better than it getting merged, getting pushed out to the cluster and all hell breaking loose due to someone not realising.

Not really got any more time to spend on this today till maybe this evening, but I think what has happened, is someone has unravelled these "hacks" (due to not knowing the extent of the inconsistency, either on the part of the reviewer, or the committer), this landed in REL1_28, which is when the breakages started to appear in the wild.

This is the most telling for me, Roan made a change (I haven't tried to find out if/when it reverted mentioning said commit) to put a back compat hack in https://github.com/wikimedia/mediawiki/commit/a7c7a3fd3305dc118416bebff2d4d209e26cfd88#diff-0734eeac0864cea78f3857f70175e8b3R1580

But this indexName function has been made a noop in master https://github.com/wikimedia/mediawiki/blob/e121e52fd87d1c949a9cd51fd4c79d0948ed8a84/includes/libs/rdbms/database/Database.php#L1957-L1959

Oh look what I just found. @aaron broke it in https://github.com/wikimedia/mediawiki/commit/bec6151e5fe30e780d82527c4c53e54379149a4e

2016-12-01
22:07 Krenair: Dropped ar_usertext_timestamp indexes from the archive tables of all 4 wikis created since September (olowiki, ecwikimedia, projectcomwiki, fiwikivoyage) - replaced with usertext_timestamp index to match all older wikis. MW fix to follow. see -tech
that deletes al indexes if exist

@Krenair What happened to the MW fix?

I reverted https://gerrit.wikimedia.org/r/#/c/312089/ in https://gerrit.wikimedia.org/r/#/q/I1c67aa464a1f151562ef92fce03f825113847814

Mentioned in SAL (#wikimedia-operations) [2017-03-24T15:27:25Z] <jynus> running unscheduled ALTER TABLE on arbcom_cswiki.archive T104756

I can now say with confidence that all archive tables on production have only the usertext_timestamp index. Whatever are the next steps for a fix on mediawiki and WMF,archive tables have currently an uniform schema.

Change 344601 had a related patch set uploaded (by Reedy):
[mediawiki/core@master] Revert "Remove old remapping hacks from Database::indexName()"

https://gerrit.wikimedia.org/r/344601

  • The revert patch above needs merging
  • We need to replace usages of 'usertext_timestamp' in MW code (not many places use it with the archive table) back to ar_usertext_timestamp, so that they work for sqlite, and then get changed back to usertext_timestamp for at least MySQL
    • Are these remappings needed for postgres/mssql etc?
  • We need to rename ar_usertext_timestamp back to usertext_timestamp in the mysql patch files, and tables.sql. Writing an updater to change it back, and fix the old updater to stop converting it in the first place. No point doing usertext_timestamp -> ar_usertext_timestamp -> ar_usertext_timestamp in one updater run

Or, we detect which index name it is at runtime, which feels a bit meh...

Trying to decide if it's worth keeping both of these tasks open

Change 344452 abandoned by Reedy:
Migrate usertext_timestamp to ar_usertext_timestamp on wiki's lower then 1.28

https://gerrit.wikimedia.org/r/344452

Krinkle renamed this task from core - archive database table - database schema is inconsistent with update patch to The "archive" database table schema is inconsistent with update patch.Jul 28 2018, 11:42 PM
Krinkle added a project: MediaWiki-Installer.
Krinkle moved this task from Untriaged to Schema changes on the MediaWiki-libs-Rdbms board.

Change 537676 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Clean up ar_usertext_timestamp index mess

https://gerrit.wikimedia.org/r/537676

Anomie added subscribers: PleaseStand, Anomie.

Since rMWc29909e59fd8: Mostly drop old pre-actor user schemas dropped the index entirely, this is almost fixed now. https://gerrit.wikimedia.org/r/537676 is still needed to fully clean it up during update.php, as the confusing history of this index may have resulted in the index existing as ar_usertext_timestamp, usertext_timestamp, or both and c29909e59 only drops one name (which may therefore error out if that name happens to not exist).

Change 537676 merged by jenkins-bot:
[mediawiki/core@master] Clean up ar_usertext_timestamp index mess

https://gerrit.wikimedia.org/r/537676