Page MenuHomePhabricator

categorylinks InnoDB extended indexes behavior differs between enwiki slaves
Closed, ResolvedPublic

Description

The API runs the following query:

SELECT /* ApiQueryCategoryMembers::run */ cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp FROM page,categorylinks FORCE INDEX (cl_timestamp) WHERE cl_to = 'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id) ORDER BY cl_timestamp,cl_from LIMIT 501;

CREATE TABLE categorylinks (

cl_from int(8) unsigned NOT NULL DEFAULT '0',
cl_to varbinary(255) NOT NULL DEFAULT '',
cl_sortkey varbinary(230) NOT NULL DEFAULT '',
cl_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
cl_sortkey_prefix varbinary(255) NOT NULL DEFAULT '',
cl_collation varbinary(32) NOT NULL DEFAULT '',
cl_type enum('page','subcat','file') NOT NULL DEFAULT 'page',
UNIQUE KEY cl_from (cl_from,cl_to),
KEY cl_timestamp (cl_to,cl_timestamp),
KEY cl_collation (cl_collation),
KEY cl_sortkey (cl_to,cl_type,cl_sortkey,cl_from)

) ENGINE=InnoDB DEFAULT CHARSET=binary

The forced cl_timestamp index is expected to utilize InnoDB extended indexes (where secondary indexes also store the clustered primary key fields) to avoid a filesort. Enwiki slaves up to MariaDB 5.5.34 do this correctly.

  • 1. row ******* id: 1 select_type: SIMPLE table: categorylinks type: ref

possible_keys: cl_timestamp

    key: cl_timestamp
key_len: 257
    ref: const
   rows: 552330
  Extra: Using index condition; Using where
  • 2. row ******* id: 1 select_type: SIMPLE table: page type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 4
    ref: enwiki.categorylinks.cl_from
   rows: 1
  Extra:

2 rows in set (0.38 sec)

Slaves running 5.5.36+ revert to using a filesort despite apparently having identical categorylinks tables. Handler% stats suport the observation.

  • 1. row ******* id: 1 select_type: SIMPLE table: categorylinks type: ref

possible_keys: cl_from,cl_timestamp,cl_sortkey

    key: cl_sortkey
key_len: 257
    ref: const
   rows: 466736
  Extra: Using index condition; Using where; Using filesort
  • 2. row ******* id: 1 select_type: SIMPLE table: page type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 4
    ref: enwiki.categorylinks.cl_from
   rows: 1
  Extra:

2 rows in set (0.32 sec)

Important to note that the first enwiki 5.5.36 slave, from which others were cloned, did have data dumped and reloaded rather than being upgraded in place, so categorylinks was definitely recreated. Possibly something has changed with how the first UNIQUE index can be chosen as primary key, and the 6-byte rowid is in use instead?

Simply rebuilding the table has no effect:

ALTER TABLE categorylinks ENGINE=InnoDB;

But making the primary key explicit restores the expected behavior:

ALTER TABLE categorylinks DROP INDEX cl_from, ADD PRIMARY KEY (cl_from, cl_to);

Need to investigate what caused the change in behavior, and whether this affects any other mediawiki tables.


Version: 1.24rc
Severity: normal

Details

Reference
bz70558

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:58 AM
bzimport set Reference to bz70558.

Wrong EXPLAIN output showing the filesort on 5.5.36+ slaves. Correct one:

  • 1. row ******* id: 1 select_type: SIMPLE table: categorylinks type: ref

possible_keys: cl_timestamp

    key: cl_timestamp
key_len: 257
    ref: const
   rows: 551942
  Extra: Using index condition; Using where; Using filesort
  • 2. row ******* id: 1 select_type: SIMPLE table: page type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 4
    ref: enwiki.categorylinks.cl_from
   rows: 1
  Extra:

2 rows in set (0.30 sec)

This is fixed as of MariaDB 10.0.15.

Slaves not yet upgraded (say, all the 5.x ones) have had primary key fields explicitly appended to secondary indexes on problem tables. No disk footprint change and works for now.