Page MenuHomePhabricator

Test index changes for cur (qry_checktouched ) and old (user_timestamp)
Closed, InvalidPublic

Description

Author: bugzilla_wikipedia_org.to.jamesd

Description:
Along with the old compression work I've been doing this
weekend I made the following index changes on the small
wikis to see how effective they are.

cur:

alter table cur add index qry_checktouched (cur_id,
cur_is_redirect, cur_namespace, cur_title, cur_touched);

This is potentially of use for these queries, at least:

/* Article::checkTouched */ SELECT
cur_touched,cur_is_redirect FROM cur WHERE cur_id='282905'
LIMIT 1
/* Title::getLinksTo */ SELECT
cur_namespace,cur_title,cur_id FROM cur,links WHERE
l_from=cur_id AND l_to=72
/* wfShowIndirectLinks */ SELECT
cur_id,cur_namespace,cur_title,cur_is_redirect FROM
links,cur WHERE l_to=360618 AND l_from=cur_id LIMIT 500
/* LinkCache::preFill */ SELECT
cur_id,cur_namespace,cur_title FROM cur,links WHERE
cur_id=l_to AND l_from=315312 FOR UPDATE

The primary cause for this is to try to get checkTouched
using this index instead of loading cur records for
unchanged cur pages whch don't realy need to be loaded. The
queries are individually fastbutit's adding constant
backgroundload to the master and reducing that load is good.
Could also usefully free up cache RAM for other things.
Title::getLinksTo and wfShowIndirectLinks sometimes show up
as quite slow also. LinkCache::preFill can often be slow.
None of those really needs the full cur record so it's nice
to dodge loading with this covering index, if possible.

The changed schema for cur in 1.5 will significantly reduce
the potential benefit of this index and in 1.5 it may be
best not to have this index and have the cur records loaded
instead, because their total size is much smaller in 1.5 and
caching most or all is practical.

old:

alter table old engine=innodb, drop index user_timestamp,
add index user_timestamp (old_user, inverse_timestamp,
old_namespace);

This is for:

EXPLAIN /* wfSpecialContributions */ SELECT
old_namespace,old_title,old_timestamp,old_comment,old_minor_e
dit,old_user_text,old_id FROM old USE INDEX
(user_timestamp) WHERE old_user =22105 AND old_namespace = 0
ORDER BY inverse_timestamp LIMIT 51 ;

The addition of the namespace lets the query avoid reading
the unnecessary namespaces when only one is desired. That
can significantly reduce thenumber of old records/ disk
seeks required when only one namespace is needed.

For 1.5 or 1.6 we should be looking for a way to have a
covering index for this query (one with all fields it uses).
It's impossible to have the physical data organisation match
both article history and user contributions without one
requiring lots of seeks. The covering index will make user
contributions fast, at the cost of some disk space. The
tinyblob comments field is one obstacle here - can't index
on it all, only a subset. It may be best to have a varchar
comment and tinyblob long comment with a more link whenever
the comment is too long for a varchar. Note that 1000 or so
is the maximum index length for InnoDB.

It's too soon for either of these to be in MediaWiki and
neither may be appropriate for 1.5 and its changed schema,
so these are local tuning only for now.


Version: unspecified
Severity: enhancement
OS: Linux

Details

Reference
bz1557
TitleReferenceAuthorSource BranchDest Branch
Security improvementsrepos/technical-documentation/techdoc-dashboard!10kbachfeature/security-improvementsdevelop
Add PagePile ID verificationrepos/technical-documentation/techdoc-dashboard!9kbachhotfix/parameter-verificationmain
gitlab-ci: Add shfmt to test stagerepos/mediawiki/services/ipoid!218kharlanmain-I7b1c966d74432edddf67cf9e63d6f38e5fcdb8c9main
gitlab-ci: Add shellcheck to test stagerepos/mediawiki/services/ipoid!217kharlanmain-Iecb21f2b3412cf5324eb21646628b4cd62eb4f4dmain
Customize query in GitLab

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:16 PM
bzimport set Reference to bz1557.
bzimport added a subscriber: Unknown Object (MLST).

I guess this is a non-issue now then? :)

epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:14 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley changed the task status from Invalid to Resolved by committing Unknown Object (Diffusion Commit).Mar 4 2015, 8:22 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
Aklapper changed the task status from Resolved to Invalid.Mar 4 2015, 11:43 AM
Aklapper claimed this task.