Page MenuHomePhabricator

Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp
Closed, DeclinedPublic

Description

Currently somewhere this is breaking a wmf wiki as not all wmf wiki's have been updated to the new index due to the fact there was no migration sql patch which there is now https://gerrit.wikimedia.org/r/#/c/344452/

This task is blocking fixing T154872 since it will result in breaking a lot more wmf wiki's if we switch it to the new index in DeletedContribsPager.php https://gerrit.wikimedia.org/r/#/c/344434/

Where to run those changes: all wiki's
When to run those changes: when ever the patches are merged and when the train is run next.

  • Schema changes to run:
    • CREATE INDEX ar_usertext_timestamp ON archive (ar_user_text,ar_timestamp);
    • ALTER TABLE archive DROP INDEX usertext_timestamp;

These changes depend on https://gerrit.wikimedia.org/r/#/c/344452/ and https://gerrit.wikimedia.org/r/#/c/344434/ to be merged.

Event Timeline

Paladox created this task.Mar 23 2017, 9:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 23 2017, 9:14 PM
Paladox triaged this task as High priority.Mar 23 2017, 9:15 PM

Changing status to reflect this needs to be done as soon as possible. This is also blocking fixing wiki's created after the 1.28 release.

demon removed a subscriber: demon.Mar 23 2017, 10:20 PM

@Paladox would you mind using this template to create a schema change ticket? It is a bit easier and clearer for us to handle them that way, so we can avoid mistakes and do them faster :) https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change
Thanks!

Paladox updated the task description. (Show Details)
Paladox updated the task description. (Show Details)Mar 24 2017, 8:21 AM

@Paladox would you mind using this template to create a schema change ticket? It is a bit easier and clearer for us to handle them that way, so we can avoid mistakes and do them faster :) https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change
Thanks!

Done :)

Noting this doesn't need doing on every wiki as some will already have the new index

Plus, this needs some special handling for the reason above... Though, curiously, it seems like literally only the newest wiki (arbcom_cswiki) has this problem, the second newest, fiwikivoyage doesn't. I thought this was a wider problem...

I know ideally the DBA's would want to do this in one pass, however due to https://github.com/wikimedia/mediawiki/blob/e01fd443887b47c86d5248a4a32eca5e5ed98a97/includes/specials/pagers/DeletedContribsPager.php#L132 if we do that, we're going to have broken wikis one way or another when the indexes switch over, but the code is still using one/the other on various wikis. So we might want to work out how we handle this gracefully in the meantime. I would presume putting an indexExists() check in would be one way as a holdover for migration, but don't know how performant that would actually be

mysql:wikiadmin@db1083 [enwiki]> show indexes from archive;
+---------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name             | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| archive |          0 | PRIMARY              |            1 | ar_id        | A         |    52160687 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | name_title_timestamp |            1 | ar_namespace | A         |        6276 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | name_title_timestamp |            2 | ar_title     | A         |    17386895 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | name_title_timestamp |            3 | ar_timestamp | A         |    52160687 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | usertext_timestamp   |            1 | ar_user_text | A         |    17386895 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | usertext_timestamp   |            2 | ar_timestamp | A         |    52160687 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | ar_revid             |            1 | ar_rev_id    | A         |    52160687 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
7 rows in set (0.00 sec)
mysql:wikiadmin@db1078 [arbcom_cswiki]> show indexes from archive;
+---------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name              | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| archive |          0 | PRIMARY               |            1 | ar_id        | A         |          19 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | name_title_timestamp  |            1 | ar_namespace | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | name_title_timestamp  |            2 | ar_title     | A         |          19 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | name_title_timestamp  |            3 | ar_timestamp | A         |          19 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | ar_usertext_timestamp |            1 | ar_user_text | A         |           4 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | ar_usertext_timestamp |            2 | ar_timestamp | A         |          19 |     NULL | NULL   |      | BTREE      |         |               |
| archive |          1 | ar_revid              |            1 | ar_rev_id    | A         |          19 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
7 rows in set (0.00 sec)

I am not sure we should agree to this change, all (or almost all) wikis have the usertext_timestamp, apparently only 1 wiki has the newer index. I think it is faster and less risky to unify on usertext_timestamp.

Looking at T154872, this seems like a mediawiki issue, not a production database issue, so I will decline this ticket. I am not going to do a massive production alter to break a name that someone else broke and is not needed-functionality wise.

jcrespo closed this task as Declined.Mar 24 2017, 10:38 AM

I am not sure we should agree to this change, all (or almost all) wikis have the usertext_timestamp, apparently only 1 wiki has the newer index. I think it is faster and less risky to unify on usertext_timestamp.

Any and all newer wikis from now on will have ar_usertext_timestamp with MW Core how it is. I'm not saying I disagree with you, but actively changing new wikis when they're created away from the one in mw core doesn't feel right.

See T154872#2926839 and also your own bug of T104756, specifically T104756#1431005 -- the renaming was done to give unique indexes across tables, specifically for SQLite

And the canonical non prefixed one is...

CREATE INDEX /*i*/usertext_timestamp ON /*_*/revision (rev_user_text,rev_timestamp);

T154872#2926839 - yes, mediawiki made an incompatible change with production. They should fix it. I do not aprove such a change for WMF databases, it is risky and unnecesary for MySQL. I will fork mediawiki if that is needed. Changing code is easy- it can be reverted easily. Deploying a change to production not only will take weeks, it can create an outage with the archive table (which mediawiki unsafe queries broke).

I do not approve for this change to go to production, when the archive table will disappear soon. I can fix arbcom_cswiki, if you want.

As this was declined I guess we can continue eight eh patches I have for moving us to the ar_ prefix. This task was closed as declined so it is not blocking the other task now.

Paladox added a comment.EditedMar 24 2017, 10:53 AM

T154872#2926839 - yes, mediawiki made an incompatible change with production. They should fix it. I do not aprove such a change for WMF databases, it is risky and unnecesary for MySQL. I will fork mediawiki if that is needed. Changing code is easy- it can be reverted easily. Deploying a change to production not only will take weeks, it can create an outage with the archive table (which mediawiki unsafe queries broke).

I do not approve for this change to go to production, when the archive table will disappear soon. I can fix arbcom_cswiki, if you want.

But your leaving all non wmf wikis broken. As the changes I have will break wmf but fix non wmf wikis.

No, you can do (propose?) whatever you want on mediawiki code to fix the other wikis -feel free to send any patches there.

We could add a config to configure what index to use, but at the same time deprecate the config. Will make it easier for wmf but at the same time allow other wikis to be fixed.

The easiest way is to rename the index for sqlite-only. We were not notified this was going to happen- we would have spoken against it well in advance. We strongly suggest to contact us in case a schema change is going to happen (I would even call it a Mediawiki policy in written form: https://www.mediawiki.org/wiki/Development_policy#Database_patches ). If we disagree- we obviously don't have veto on anything development- but we could have say how problematic it is to deploy this and suggested alternative ways to fix the issue. Whoever deployed the change create a deployment issue. I do not think we are the ones at fault here.

The easiest way is to rename the index for sqlite-only. We were not notified this was going to happen- we would have spoken against it well in advance. We strongly suggest to contact us in case a schema change is going to happen (I would even call it a Mediawiki policy in written form: https://www.mediawiki.org/wiki/Development_policy#Database_patches ). If we disagree- we obviously don't have veto on anything development- but we could have say how problematic it is to deploy this and suggested alternative ways to fix the issue. Whoever deployed the change create a deployment issue. I do not think we are the ones at fault here.

You weren't around Wikimedia in this capacity in 2009, so giving you a heads up would be difficult. rMW4124558d7b40df818337f1d7eeae10c9ee9688e2 and a followup in 2009 also predates you https://github.com/wikimedia/mediawiki/commit/a7c7a3fd3305dc118416bebff2d4d209e26cfd88 :)

The latter patch has a method for index remapping. Looking at MW Core as of today, that seems to have disappeared, which is presumably where (some of) these problems relate to. Someone (I haven't tried to find out who/when), has presumably tried to unravel this, but not done it fully, or has done so not knowing the extent of the inconsistency across older versions.

	/**
	 * Get the name of an index in a given table.
	 *
	 * @param string $index
	 * @return string
	 */
	protected function indexName( $index ) {
		return $index;
	}

https://github.com/wikimedia/mediawiki/blame/master/maintenance/tables.sql#L503

I'm also mightily confused, how when this index has been the canonical one in master for 8 years, how only 1 WMF wiki has this index. Presumably the answer is because it has been migrated back for some other bug?

I'd hazard a guess, that numerous wikis have been created in those 8 years...

Let's talk in the parent ticket and I will definitely help.