Page MenuHomePhabricator

IndexPager::buildQueryInfo (contributions page unfiltered) query needs tuning
Closed, ResolvedPublic

Description

Author: afeldman

Description:
The FORCE INDEX provided in this query results in a /much/ slower query plan than without. This is the case with both MySQL 5.1-facebook, and MariaDB 5.5, so it should just be removed.

Query:

SELECT /* IndexPager::buildQueryInfo (contributions page unfiltered) user */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,user_name,page_namespace,page_title,page_is_new,page_latest,page_is_redirect,page_len,ts_tags FROM revision FORCE INDEX (user_timestamp) INNER JOIN page ON ((page_id = rev_page)) LEFT JOIN user ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN user_groups ON ((ug_user = rev_user) AND ug_group = 'bot') LEFT JOIN tag_summary ON ((ts_rev_id=rev_id)) WHERE (rev_user >18370655) AND (ug_group IS NULL) AND ((rev_deleted & 4) = 0) AND (rev_timestamp<'20130301192103') ORDER BY rev_timestamp DESC LIMIT 51;

Times of the query as above:

production enwiki master - mysql 5.1-facebook: 51 rows in set (7 min 23.03 sec)
mariadb 5.5.29: 51 rows in set (1.29 sec)

Times with the FORCE INDEX (user_timestamp) removed
production enwiki master - mysql 5.1-facebook: 51 rows in set (0.26 sec)
mariadb 5.5.29: 51 rows in set (0.20 sec)

The EXPLAIN without the FORCE INDEX looks worse on 5.1 without than with, based on number of rows examined, but it avoids a filesort. It looks better without the FORCE INDEX on mariadb than mysql 5.1 with or without. In both cases, rev_timestamp is used instead of user_timestamp.


Version: 1.21.x
Severity: normal

Details

Reference
bz45619

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:39 AM
bzimport set Reference to bz45619.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

The EXPLAIN without the FORCE INDEX looks worse on 5.1 without than with,
based
on number of rows examined, but it avoids a filesort. It looks better
without
the FORCE INDEX on mariadb than mysql 5.1 with or without. In both cases,
rev_timestamp is used instead of user_timestamp.

function getIndexField() {

		return 'rev_timestamp';

}

There is also code using usertext_timestamp:

			if ( $uid ) {
				$condition['rev_user'] = $uid;
				$index = 'user_timestamp';
			} else {
				$condition['rev_user_text'] = $this->target;
				$index = 'usertext_timestamp';
			}

CREATE INDEX /*i*/user_timestamp ON /*_*/revision (rev_user,rev_timestamp);
CREATE INDEX /*i*/usertext_timestamp ON /*_*/revision (rev_user_text,rev_timestamp);

I guess we should remove the force index for that one too?

Bump. These queries forcing user_timestamp are running for 5+ minutes on slaves for several of the bigger WMF wikis.

Multiple identical copies running simultaneously too, presumably when clients assume slow == failure and blindly retry.

Re comment 2: Example would be good. Best to review EXPLAIN output for one forcing usertext_timestamp. It might be a completely different story.

Found another example of this with a slightly different query form:

SELECT /* IndexPager::buildQueryInfo (contributions page unfiltered) 14.0.143.195 */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,user_name,page_namespace,page_title,page_is_new,page_latest,page_is_redirect,page_len,ts_tags FROM revision INNER JOIN page ON ((page_id = rev_page)) LEFT JOIN user ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN tag_summary ON ((ts_rev_id=rev_id)) INNER JOIN change_tag FORCE INDEX (change_tag_tag_id) ON ((ct_rev_id=rev_id)) WHERE rev_user = '3030741' AND ((rev_deleted & 4) = 0) AND ct_tag = 'visualeditor' ORDER BY rev_timestamp DESC LIMIT 51;

On enwiki, forcing the index hits ~900000 rows with a temp table and filesort. Without the force: ~50000 rows, no temp table or filesort.

Both 5.1 and 5.5 EXPLAIN look the same, so removing this FORCE on change_tag_tag_id would also be a win across the board.

Not entirely sure why I was CCed on this, but...

Regarding the FORCE INDEX (user_timestamp) in comment 0, that appears to be coming from includes/specials/SpecialContributions.php, ContribsPager::getUserCond() and ContribsPager::getQueryInfo(). git archeology finds that this force index was added way back in 2005, r9796.

Regarding the FORCE INDEX (change_tag_tag_id) in comment 3, that's coming from ChangeTags::modifyDisplayQuery(), added in r49068 with a comment "fix tag filtering by adding a FORCE INDEX to the relevant queries". There's also forcing of that index in API modules that do the same sort of thing added in r58399, which refers to bug 22032, which makes it sound like r49068 was fixing some issue where a bad query was bringing down the site. Since I have no idea of the background on that and the problematic queries don't seem to be mentioned in the change summaries, I'm CCing some of the people involved in those revisions.

Change 96228 had a related patch set uploaded by Springle:
remove specific FORCE INDEX clauses causing trouble, bug 45619

https://gerrit.wikimedia.org/r/96228

Change 96228 had a related patch set uploaded by Aaron Schulz:
remove specific FORCE INDEX clauses causing trouble, bug 45619

https://gerrit.wikimedia.org/r/96228

Change 96228 merged by jenkins-bot:
remove specific FORCE INDEX clauses causing trouble

https://gerrit.wikimedia.org/r/96228

I think this reappeared, but given the age of this ticket, I will open a new one (probably not related to the original issue anymore).