Page MenuHomePhabricator

cl_collation index redundant
Closed, ResolvedPublic


mysql wmf db1051 root enwiki> SELECT s.table_name,s.index_name,rows_read FROM information_schema.statistics s LEFT JOIN information_schema. index_statistics  i ON (i.table_schema=s.table_schema AND i.table_name=s.table_name AND i.index_name=s.index_name) WHERE s.table_schema='enwiki' and s.table_name = 'categorylinks' AND  seq_in_index=1;
| table_name    | index_name   | rows_read  |
| categorylinks | cl_sortkey   |  780279158 |
| categorylinks | cl_from      | 2135832387 |
| categorylinks | cl_timestamp |  150744319 |
| categorylinks | cl_collation |       NULL | <-- never used
4 rows in set (0.26 sec)

Same result appeared for various slaves and wikis across all WMF clusters.

cl_collation could be dropped.

Or could it? Does the updateCollation batch job need it?



Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:19 AM
bzimport added a project: Wikimedia-Rdbms.
bzimport set Reference to bz57184.
Springle created this task.Nov 18 2013, 2:12 PM
matmarex updated the task description. (Show Details)Feb 7 2015, 10:32 PM
matmarex set Security to None.

@tstarling had said in rMWa43f751cf6b6849e:

Mark the cl_collation index as needing deletion, it was always pretty
pointless. You can't do much better than a full table scan when you're
changing the collation value on a wiki.

That commit did not actually drop the index though.

Krenair added a subscriber: Krenair.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 2 2015, 2:40 PM
This comment was removed by Krenair.
Krenair removed Springle as the assignee of this task.Oct 4 2015, 3:03 AM
Danny_B moved this task from Unsorted to Cleanup on the Schema-change board.May 4 2016, 3:09 AM

Per T130692, the cl_collation_ext index was added to Wikimedia databases for use by updateCollation.php. @jcrespo, has the old cl_collation index been removed everywhere, including on the sites that the new index had already been added to? If so, this task can be closed.

except on one server where I forgot to run this query (and I am running it now) I executed on all of them: "alter table categorylinks DROP INDEX cl_collation, ADD INDEX cl_collation_ext (cl_collation, cl_to, cl_type, cl_from)", with this result:

CREATE TABLE `categorylinks` (
  `cl_from` int(8) unsigned NOT NULL DEFAULT '0',
  `cl_to` varbinary(255) NOT NULL DEFAULT '',
  `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
  `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
  `cl_collation` varbinary(32) NOT NULL DEFAULT '',
  `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
  UNIQUE KEY `cl_from` (`cl_from`,`cl_to`),
  KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
  KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
  KEY `cl_collation_ext` (`cl_collation`,`cl_to`,`cl_type`,`cl_from`)
matmarex closed this task as Resolved.Jun 2 2016, 7:52 PM
matmarex assigned this task to jcrespo.

Per above.