Page MenuHomePhabricator

QueryPage::doFeed SELECT 'Newpages' as type ...
Closed, ResolvedPublic

Description

Author: bugzilla_wikipedia_org.to.jamesd

Description:
This uses an ORDER BY and LIMIT combination which forces
retrieval of every record which matches the where before the
LIMIT can be used. Changing ORDER BY value DESC to ORDER BY
rc_timestamp DESC uses the index and causes only the number
of records needed to reach the limit count to be retrieved,
much more efficient. Run time difference is from 10 or more
seconds now to about 0.01 seconds with the change.

If there's objection to using timestamp when there's a
chance that timestamp may be duplicated, use rc_timestamp,
rc_cur_id and add rc_cur_id to the new_name_timestamp index
and I'll change the index on the live Wikimedia sites. For
unchanged sites it'll be at least no worse and may still be
faster for them.

The piece to watch for in what follows is "using filesort".
That's the giveaway that the index isn't being used for the
limit.

Current query:

EXPLAIN /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
value, rc_user AS user, rc_user_text AS user_text,
rc_comment as comment, rc_timestamp AS timestamp, '0' as
usepatrol, rc_patrolled AS patrolled, rc_id AS rcid, length
(cur_text) as length, cur_text as text FROM
recentchanges,cur WHERE rc_cur_id=cur_id AND rc_new=1
AND rc_namespace=0 AND cur_is_redirect=0 ORDER BY value DESC
LIMIT 0,50\G

  • 1. row ******* table: recentchanges type: ref

possible_keys:
rc_namespace_title,rc_cur_id,new_name_timestamp

    key: new_name_timestamp
key_len: 2
    ref: const,const
   rows: 15962
  Extra: Using where; Using filesort
  • 2. row ******* table: cur type: eq_ref

possible_keys: cur_id

    key: cur_id
key_len: 4
    ref: recentchanges.rc_cur_id
   rows: 1
  Extra: Using where

Suggested change:

EXPLAIN /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
value, rc_user AS user, rc_user_text AS user_text,
rc_comment as comment, rc_timestamp AS timestamp, '0' as
usepatrol, rc_patrolled AS patrolled, rc_id AS rcid, length
(cur_text) as length, cur_text as text FROM
recentchanges,cur WHERE rc_cur_id=cur_id AND rc_new=1
AND rc_namespace=0 AND cur_is_redirect=0 ORDER BY
rc_timestamp DESC LIMIT 0,50\G

  • 1. row ******* table: recentchanges type: ref

possible_keys:
rc_namespace_title,rc_cur_id,new_name_timestamp

    key: new_name_timestamp
key_len: 2
    ref: const,const
   rows: 15984
  Extra: Using where
  • 2. row ******* table: cur type: eq_ref

possible_keys: cur_id

    key: cur_id
key_len: 4
    ref: recentchanges.rc_cur_id
   rows: 1
  Extra: Using where

And adding the rc_cur_id after the timestamp shows it back
to using a fielsort with the current index but it won't if
rc_cur_id is added to it:

EXPLAIN /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
value, rc_user AS user, rc_user_text AS user_text,
rc_comment as comment, rc_timestamp AS timestamp, '0' as
usepatrol, rc_patrolled AS patrolled, rc_id AS rcid, length
(cur_text) as length, cur_text as text FROM
recentchanges,cur WHERE rc_cur_id=cur_id AND rc_new=1
AND rc_namespace=0 AND cur_is_redirect=0 ORDER BY
rc_timestamp, rc_cur_id DESC LIMIT 0,50\G

  • 1. row ******* table: recentchanges type: ref

possible_keys:
rc_namespace_title,rc_cur_id,new_name_timestamp

    key: new_name_timestamp
key_len: 2
    ref: const,const
   rows: 16950
  Extra: Using where; Using filesort
  • 2. row ******* table: cur type: eq_ref

possible_keys: cur_id

    key: cur_id
key_len: 4
    ref: recentchanges.rc_cur_id
   rows: 1
  Extra: Using where

The changed query had a run time of 0.01 seconds. Making the
original query immediately after that (still with caching
benefit from the fast form) took 5.91 seconds. I noticed the
query when I saw this in mytop:

7414082 wikiuser ialrazi:36569 enwiki 10
Query /* QueryPage::doFeed */ SELECT 'Newpages' as type,
rc_namespace AS namespace, rc_title AS title, rc_cur_id AS
val


Version: 1.4.x
Severity: normal

Details

Reference
bz1481
TitleReferenceAuthorSource BranchDest Branch
Remove unused "Harbormaster" application from default settingsrepos/phabricator/deployment!27aklapperT348115-harbormasterwmf/stable
Default start offset for kafka sourcesrepos/search-platform/cirrus-streaming-updater!40pfischerdefault-kafka-start-offsetsmain
gitlab-ci: Adjust unit and integration setuprepos/mediawiki/services/ipoid!139kharlangitlab-ci-fix-65afmain
gitlab-ci: Adjust rules for run-{unit/integration}-testsrepos/mediawiki/services/ipoid!137kharlanfix-coverage-aff9main
gitlab-ci: List more job dependenciesrepos/mediawiki/services/ipoid!134kharlanT348146-763amain
gitlab-ci: Use needs property for jobsrepos/mediawiki/services/ipoid!132kharlanT348146-5b39main
pipeline: Use alternative character sequence for separating statementsrepos/mediawiki/services/ipoid!109kharlanadd-test-data-cefa-ef8fadd-test-data-cefa-0add
pipeline: Use alternative character sequence for separating statementsrepos/mediawiki/services/ipoid!106kharlanadd-test-data-cefa-ef8fadd-test-data-cefa-0add
Collect test report artifactsrepos/mediawiki/services/ipoid!101dancyreview/dancy/test-artifactsmain
import: Make batch limit configurable via environment variablerepos/mediawiki/services/ipoid!100kharlanbatch-5a3fmain
import: Bump default batch limitrepos/mediawiki/services/ipoid!99kharlanbatch-0a6fmain
pipeline: Use newline for separating statementsrepos/mediawiki/services/ipoid!96kharlanadd-test-data-ef8fadd-test-data-0add
Show related patches Customize query in GitLab

Revisions and Commits

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:12 PM
bzimport set Reference to bz1481.
bzimport added a subscriber: Unknown Object (MLST).

Uses timestamp paging now (for a while)

Diffusion added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:22 AM