Evaluate the feasibility of cache invalidation for the action API
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Tgr
	Jan 5 2016, 2:04 AM

Description

The action API (api.php) supports caching for a user-defined period of time, but does not support cache invalidation, which is problematic for functionality that's under high load but might need to be updated immediately to reflect changes or remove vandalism, personal data etc. With the recent opensourcing of hashninja, maybe that can change. A possible approach to add partial cache invalidation could be:

install the Xkey (aka hashtwo/hashninja) Varnish module
every time the API is called, decide if it is a purgeable request (to cut down the number of URLs that need to be purged) - to be purgeable, it must invoke a single module, it must be about a single target (title, revision, file etc), the parameters must be in lexicographic order, for list parameters the content must be in lexicographic order, and it cannot have any non-enum parameter apart from the object identifier (e.g. title). Or maybe that could be relaxed a bit, depending on the hashninja performance and API usage stats.
on purgeable requests have the API module output a bunch of Xkey HTTP headers with an invalidation tag each, such as id:<page_id>
have MediaWiki send an appropriate purge request (a normal HTTP request with an Xkey-Purge HTTP header containing a list of invalidation tags) on every content update

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
Duplicate	None	T109331 Deleted files sometimes remain visible to non-privileged users if permanently linked
Duplicate	None	T133819 upload-lb.ulsfo.wikimedia.org still allow access to some deleted files
Duplicate	BBlack	T119038 Image cache issue when 'over-writing' an image on commons
Resolved	• ema	T133821 Make CDN purges reliable
Open	None	T122867 Evaluate the feasibility of cache invalidation for the action API
Resolved	• ema	T122881 Install XKey vmod
Resolved	• ema	T131499 Upgrade all cache clusters to Varnish 4
Resolved	• ema	T126206 Upgrade to Varnish 4: things to remember
Resolved	• ema	T128788 Port varnishlog.py to new VSL API
Resolved	• ema	T131353 Port remaining scripts depending on varnishlog.py to new VSL API
Resolved	• ema	T131501 Convert misc cluster to Varnish 4
Resolved	• ema	T134989 WDQS empty response - transfer clsoed with 15042 bytes remaining to read
Resolved	• ema	T131502 Convert upload cluster to Varnish 4
Resolved	BBlack	T131761 Solve large-object/stream/pass/chunked in upload cluster better
Resolved	• ema	T142076 Analyze Range requests on cache_upload frontend
Resolved	• ema	T142233 Varnish 4 stalls with two consecutive Range requests using HTTP persistent connections
Resolved	• ema	T131503 Convert text cluster to Varnish 4
Resolved	BBlack	T135696 Sort out vcl_deliver vs vcl_synth mess with v4 VCL
Resolved	• ema	T150660 Post Varnish 4 migration cleanup

Event Timeline

Tgr created this task.Jan 5 2016, 2:04 AM

Tgr raised the priority of this task from to Needs Triage.

Tgr updated the task description. (Show Details)

Tgr added projects: MediaWiki-Action-API, Varnish.

Tgr subscribed.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 5 2016, 2:04 AM

• MZMcBride subscribed.Jan 5 2016, 3:19 AM

Anomie moved this task from Unsorted to Non-Code on the MediaWiki-Action-API board.Jan 5 2016, 3:58 AM

@BBlack what do you think, is there a chance to deploy Xkey at Wikimedia? How could we measure or estimate its performance?

@bd808 any thoughts about API request stats? I assume the API usage stats are too well cleaned up to be of any help here, and we need to use the raw webrequests table - write an UDF for the "is purgeable?" logic, another one to extract the invalidation tag, and filter for purgeable GET API requests and count the number of requests per tag?

In T122867#1915700, @Tgr wrote:

@bd808 any thoughts about API request stats? I assume the API usage stats are too well cleaned up to be of any help here, and we need to use the raw webrequests table - write an UDF for the "is purgeable?" logic, another one to extract the invalidation tag, and filter for purgeable GET API requests and count the number of requests per tag?

Are you looking for an estimate of how many requests would currently be purgable under the scheme you describe? I think that it should be possible to figure that out from the webrequests table or actually even from the raw api.log data on fluorine.

I think the more important metric is how many cached requests would have to be deleted on a single purge.

BBlack added subtasks: T122881: Install XKey vmod, T122880: Evaluate and Test Limited Deployment of Varnish 4.Jan 5 2016, 2:27 PM

I've just created the above tasks based on our existing goal for next quarter of getting Varnish 4 up and running here (wasn't recorded in phab yet AFAIK), which is a pre-requisite to getting XKey going. Note we're also potentially wanting this for Thumbor purging ( T121391 ).

Keep in mind even the Varnish4 goal listed as a blocker here isn't enough to unblock this fully, as the goal there is just to get it running at all for a single production cluster as a trial - there's still an additional step beyond that of moving to Varnish 4 for all the clusters (critically in this case, the "text" cluster).

A little brainstorming on the API side of things:

An API query request generically could depend on a number of pages:

The pages listed in the 'titles', 'pageids', or 'revids' parameters, if any. This is up to 500 pages.
- If automatic redirect resolution is used, then the targets of any redirects in there too.
If a generator is used, the pages output by the generator. This is up to 5000 pages.
- If automatic redirect resolution is used, then the targets of any redirects in there too.
For 'list' modules, the page(s) specified in their parameters, if any. For example, the 'cmtitle' for list=categorymembers. Offhand, I think this is 0 or 1 per module.
For some 'list' modules and backlinks-style 'prop's, we might have to include the pages actually output too (up to 5000 pages per module). Others might already be purged by MediaWiki, e.g. when a template is edited all pages transcluding it get purged, or when a page is edited any categories that are added or removed get purged.
Some are probably just not practically cacheable, for example list=recentchanges on an active wiki would need to be purged on every edit (unless rcstart/rcend specify "older than timestamp T", anyway).

For non-query modules, we'd likely have to look at it on a case-by-case basis. Parse and expandtemplates, for example, could probably be XKeyed, while many others wouldn't work.

Some of Gergő's proposed limitations would cut down on the total number of XKeys per request as well as reducing the number of URLs subject to having XKeys at all, at the cost of limiting the number of API requests that are usable with this scheme. We'd probably want to indicate to the client somehow that the request was XKeyable so developers don't have to guess.

Or another alternative would be to make a "pageinfo" action oriented towards getting information about a single page, with less fine-grained options as to what exactly can be queried (e.g. the equivalent to prop=revisions wouldn't have an "rvprop" parameter). That may need some thought to avoid code duplication between query and "pageinfo", though.

Tgr mentioned this in T100262: [SPIKE] Which API requests should be cached and for how long?.Jan 11 2016, 10:13 PM

Smalyshev mentioned this in T128667: Special:EntityData with flavor is cached but not purged properly.Mar 2 2016, 11:23 PM

Smalyshev subscribed.

• ema edited subtasks, added: T131499: Upgrade all cache clusters to Varnish 4; removed: T122880: Evaluate and Test Limited Deployment of Varnish 4.Apr 1 2016, 2:05 PM

Anomie mentioned this in T62835: Enable cross-domain API requests in API's JSON responses.Apr 1 2016, 2:28 PM

He7d3r subscribed.Apr 1 2016, 5:45 PM

Aklapper added a project: Traffic.May 4 2016, 9:13 AM

Restricted Application added a project: SRE. · View Herald TranscriptMay 4 2016, 9:13 AM

• chasemp triaged this task as Medium priority.May 5 2016, 8:44 PM

• Mholloway subscribed.Jun 3 2016, 2:30 AM

• ema closed subtask T122881: Install XKey vmod as Resolved.Aug 11 2016, 2:45 PM

Tgr mentioned this in T138093: Investigate query parameter normalization for MW/services.Aug 25 2016, 10:04 PM

• ema moved this task from Backlog to Caching on the Traffic board.Sep 30 2016, 2:51 PM

• ema closed subtask T131499: Upgrade all cache clusters to Varnish 4 as Resolved.Nov 24 2016, 3:07 PM

Tgr mentioned this in T152425: Use varnish xkey to purge output of Special:EntityData when appropriate.Feb 4 2017, 6:33 AM

Tgr mentioned this in T155314: Varnish does not cache Action API responses when logged in.Jul 19 2017, 2:11 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:28 PM

Tgr mentioned this in T221177: REST route handler extension interface RFC.May 6 2019, 2:36 AM

Tgr mentioned this in T224365: REST API URL canonization.May 26 2019, 1:19 PM

• Demian subscribed.Feb 12 2020, 4:32 PM

Restricted Application added a project: Platform Engineering. · View Herald TranscriptFeb 12 2020, 4:32 PM

WDoranWMF removed a project: Platform Engineering.Feb 12 2020, 7:37 PM

Paladox subscribed.Feb 12 2020, 7:39 PM

BBlack moved this task from Caching to Epic Ideas on the Traffic board.Oct 2 2020, 1:57 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:02 PM

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

Evaluate the feasibility of cache invalidation for the action APIOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Evaluate the feasibility of cache invalidation for the action API
Open, MediumPublic
Actions

Related Objects
Search...