Evaluate the feasibility of cache invalidation for the action API
Open, NormalPublic

Description

The action API (api.php) supports caching for a user-defined period of time, but does not support cache invalidation, which is problematic for functionality that's under high load but might need to be updated immediately to reflect changes or remove vandalism, personal data etc. With the recent opensourcing of hashninja, maybe that can change. A possible approach to add partial cache invalidation could be:

  • install the Xkey (aka hashtwo/hashninja) Varnish module
  • every time the API is called, decide if it is a purgeable request (to cut down the number of URLs that need to be purged) - to be purgeable, it must invoke a single module, it must be about a single target (title, revision, file etc), the parameters must be in lexicographic order, for list parameters the content must be in lexicographic order, and it cannot have any non-enum parameter apart from the object identifier (e.g. title). Or maybe that could be relaxed a bit, depending on the hashninja performance and API usage stats.
  • on purgeable requests have the API module output a bunch of Xkey HTTP headers with an invalidation tag each, such as id:<page_id>
  • have MediaWiki send an appropriate purge request (a normal HTTP request with an Xkey-Purge HTTP header containing a list of invalidation tags) on every content update

Related Objects

Tgr created this task.Jan 5 2016, 2:04 AM
Tgr updated the task description. (Show Details)
Tgr raised the priority of this task from to Needs Triage.
Tgr added projects: MediaWiki-API, Varnish.
Tgr added a subscriber: Tgr.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 5 2016, 2:04 AM
Anomie moved this task from Unsorted to Non-Code on the MediaWiki-API board.Jan 5 2016, 3:58 AM
Tgr added subscribers: BBlack, bd808.Jan 5 2016, 5:36 AM

@BBlack what do you think, is there a chance to deploy Xkey at Wikimedia? How could we measure or estimate its performance?

@bd808 any thoughts about API request stats? I assume the API usage stats are too well cleaned up to be of any help here, and we need to use the raw webrequests table - write an UDF for the "is purgeable?" logic, another one to extract the invalidation tag, and filter for purgeable GET API requests and count the number of requests per tag?

bd808 added a comment.Jan 5 2016, 5:47 AM

@bd808 any thoughts about API request stats? I assume the API usage stats are too well cleaned up to be of any help here, and we need to use the raw webrequests table - write an UDF for the "is purgeable?" logic, another one to extract the invalidation tag, and filter for purgeable GET API requests and count the number of requests per tag?

Are you looking for an estimate of how many requests would currently be purgable under the scheme you describe? I think that it should be possible to figure that out from the webrequests table or actually even from the raw api.log data on fluorine.

Tgr added a comment.Jan 5 2016, 7:06 AM

I think the more important metric is how many cached requests would have to be deleted on a single purge.

BBlack added a comment.Jan 5 2016, 2:32 PM

I've just created the above tasks based on our existing goal for next quarter of getting Varnish 4 up and running here (wasn't recorded in phab yet AFAIK), which is a pre-requisite to getting XKey going. Note we're also potentially wanting this for Thumbor purging ( T121391 ).

Keep in mind even the Varnish4 goal listed as a blocker here isn't enough to unblock this fully, as the goal there is just to get it running at all for a single production cluster as a trial - there's still an additional step beyond that of moving to Varnish 4 for all the clusters (critically in this case, the "text" cluster).

Anomie added a subscriber: Anomie.Jan 5 2016, 4:13 PM

A little brainstorming on the API side of things:

An API query request generically could depend on a number of pages:

  • The pages listed in the 'titles', 'pageids', or 'revids' parameters, if any. This is up to 500 pages.
    • If automatic redirect resolution is used, then the targets of any redirects in there too.
  • If a generator is used, the pages output by the generator. This is up to 5000 pages.
    • If automatic redirect resolution is used, then the targets of any redirects in there too.
  • For 'list' modules, the page(s) specified in their parameters, if any. For example, the 'cmtitle' for list=categorymembers. Offhand, I think this is 0 or 1 per module.
  • For some 'list' modules and backlinks-style 'prop's, we might have to include the pages actually output too (up to 5000 pages per module). Others might already be purged by MediaWiki, e.g. when a template is edited all pages transcluding it get purged, or when a page is edited any categories that are added or removed get purged.
  • Some are probably just not practically cacheable, for example list=recentchanges on an active wiki would need to be purged on every edit (unless rcstart/rcend specify "older than timestamp T", anyway).

For non-query modules, we'd likely have to look at it on a case-by-case basis. Parse and expandtemplates, for example, could probably be XKeyed, while many others wouldn't work.

Some of Gergő's proposed limitations would cut down on the total number of XKeys per request as well as reducing the number of URLs subject to having XKeys at all, at the cost of limiting the number of API requests that are usable with this scheme. We'd probably want to indicate to the client somehow that the request was XKeyable so developers don't have to guess.

Or another alternative would be to make a "pageinfo" action oriented towards getting information about a single page, with less fine-grained options as to what exactly can be queried (e.g. the equivalent to prop=revisions wouldn't have an "rvprop" parameter). That may need some thought to avoid code duplication between query and "pageinfo", though.

He7d3r added a subscriber: He7d3r.Apr 1 2016, 5:45 PM
Restricted Application added a project: Operations. · View Herald TranscriptMay 4 2016, 9:13 AM
chasemp triaged this task as Normal priority.May 5 2016, 8:44 PM
ema moved this task from Triage to Caching on the Traffic board.Sep 30 2016, 2:51 PM