Author: bugzilla_wikipedia_org.to.jamesd
Description:
This uses an ORDER BY and LIMIT combination which forces
retrieval of every record which matches the where before the
LIMIT can be used. Changing ORDER BY value DESC to ORDER BY
rc_timestamp DESC uses the index and causes only the number
of records needed to reach the limit count to be retrieved,
much more efficient. Run time difference is from 10 or more
seconds now to about 0.01 seconds with the change.
If there's objection to using timestamp when there's a
chance that timestamp may be duplicated, use rc_timestamp,
rc_cur_id and add rc_cur_id to the new_name_timestamp index
and I'll change the index on the live Wikimedia sites. For
unchanged sites it'll be at least no worse and may still be
faster for them.
The piece to watch for in what follows is "using filesort".
That's the giveaway that the index isn't being used for the
limit.
Current query:
EXPLAIN /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
value, rc_user AS user, rc_user_text AS user_text,
rc_comment as comment, rc_timestamp AS timestamp, '0' as
usepatrol, rc_patrolled AS patrolled, rc_id AS rcid, length
(cur_text) as length, cur_text as text FROM
recentchanges,cur WHERE rc_cur_id=cur_id AND rc_new=1
AND rc_namespace=0 AND cur_is_redirect=0 ORDER BY value DESC
LIMIT 0,50\G
- 1. row ******* table: recentchanges type: ref
possible_keys:
rc_namespace_title,rc_cur_id,new_name_timestamp
key: new_name_timestamp key_len: 2 ref: const,const rows: 15962 Extra: Using where; Using filesort
- 2. row ******* table: cur type: eq_ref
possible_keys: cur_id
key: cur_id key_len: 4 ref: recentchanges.rc_cur_id rows: 1 Extra: Using where
Suggested change:
EXPLAIN /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
value, rc_user AS user, rc_user_text AS user_text,
rc_comment as comment, rc_timestamp AS timestamp, '0' as
usepatrol, rc_patrolled AS patrolled, rc_id AS rcid, length
(cur_text) as length, cur_text as text FROM
recentchanges,cur WHERE rc_cur_id=cur_id AND rc_new=1
AND rc_namespace=0 AND cur_is_redirect=0 ORDER BY
rc_timestamp DESC LIMIT 0,50\G
- 1. row ******* table: recentchanges type: ref
possible_keys:
rc_namespace_title,rc_cur_id,new_name_timestamp
key: new_name_timestamp key_len: 2 ref: const,const rows: 15984 Extra: Using where
- 2. row ******* table: cur type: eq_ref
possible_keys: cur_id
key: cur_id key_len: 4 ref: recentchanges.rc_cur_id rows: 1 Extra: Using where
And adding the rc_cur_id after the timestamp shows it back
to using a fielsort with the current index but it won't if
rc_cur_id is added to it:
EXPLAIN /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
value, rc_user AS user, rc_user_text AS user_text,
rc_comment as comment, rc_timestamp AS timestamp, '0' as
usepatrol, rc_patrolled AS patrolled, rc_id AS rcid, length
(cur_text) as length, cur_text as text FROM
recentchanges,cur WHERE rc_cur_id=cur_id AND rc_new=1
AND rc_namespace=0 AND cur_is_redirect=0 ORDER BY
rc_timestamp, rc_cur_id DESC LIMIT 0,50\G
- 1. row ******* table: recentchanges type: ref
possible_keys:
rc_namespace_title,rc_cur_id,new_name_timestamp
key: new_name_timestamp key_len: 2 ref: const,const rows: 16950 Extra: Using where; Using filesort
- 2. row ******* table: cur type: eq_ref
possible_keys: cur_id
key: cur_id key_len: 4 ref: recentchanges.rc_cur_id rows: 1 Extra: Using where
The changed query had a run time of 0.01 seconds. Making the
original query immediately after that (still with caching
benefit from the fast form) took 5.91 seconds. I noticed the
query when I saw this in mytop:
7414082 wikiuser ialrazi:36569 enwiki 10
Query /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
val
Version: 1.4.x
Severity: normal