Page MenuHomePhabricator

Improve use of MySQL query cache by rounding some timestamp queries
Closed, ResolvedPublic

Description

Author: bugzilla_wikipedia_org.to.jamesd

Description:
Queries like this are quite common:

wikiuser dewiki 158 SELECT COUNT(*) AS n FROM cur
WHERE cur_timestamp>'20040902235857'

MySQL has a query cache which will return the result
very quickly if the query string is an exact match
(including case) of a query in the cache. At present
these queries require a count to the nearest second,
making the cache almost useless for them. Replacing
the last three or four digits with 0 would allow the
query cache to cache many of them. Even two digits
would help en on Wikipedia. Suggest rounding in this
way for any query where an approximate value is good
enough.

I think that the example above is from a watchlist.


Version: unspecified
Severity: normal

Details

Reference
bz438

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 6:55 PM
bzimport set Reference to bz438.
bzimport added a subscriber: Unknown Object (MLST).

This particular query is used for the watchlist to get an idea of which of two ways of
pulling data will be more efficient. Really it's a sucky hack anyway; an improved watchlist
query could make this question obsolete.

Since we're seeing edits just a few seconds apart anyway, I wonder just how much
opportunity there is to cache. Are watchlist loads more frequent than edits?

bugzilla_wikipedia_org.to.jamesd wrote:

Watchlist queries aren't more frequent than edits but they
are far more of a problem. An edit is usually insignificant
load-wise, at present, while this (and other) watchlist
queries can run for many tens of seconds or even a minute or
more and cause a problem for other queries, including other
inserts.

This particular watchlist query is deletable - the query is
actually slower than the one it's trying to protect against.
On en there are few watchlists over 10,000 items and the
whole thing is in the hundreds of thousands of articles, so
there's no need to worry about someone watching a high enough
percentage of articles for a full cur scan to be worth doing.
If there is worry that it's needed, the rounding approach to
this and anything similar is the way to go as a quick change
to make.

I don't understand how the query cache is supposed to be invoked by the proposed rounding if the queries are less frequent than edits. The query
cache will be blown away by every edit changing the cur table, won't it?

bugzilla_wikipedia_org.to.jamesd wrote:

It is invalidated when any involved table is changed. It's an odds game. These
are often slow, so whether one will be so slow that it won't complete before the
table is changed and whether another one will come in in the remaining available
time is a question to wonder about. For en the answer will often be no. For
other wikis, the run time is lower, the edit rate is lower and the odds are
better, freeing up mor eresources for the en ones and other queries which the
server has to deal with. Some chance beats no chance and at 2,000 queries per
second, we'll get lucky sometimes.

That query is removed from REL1_4 branch, as useless. Watchlists are served from
recentchanges tables now.

epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:14 AM