Page MenuHomePhabricator

Use varnish xkey to purge output of Special:EntityData when appropriate
Closed, DeclinedPublic

Description

Varnish 4.1 supports "tagging" cache entries using the xkey module. Backend responses can declare "tags" by setting the xkey header, and all cache entries with the same xkey tag can be purged at once.

This can be used to purge the various resources generated by Special:EntityData when the entity they are based on changes.

This can also be seen as an experiment for collecting experiences with the xkey mechanism before using it to implement T114662: RFC: Per-language URLs for multilingual wiki pages.

Event Timeline

Do we have/plan to have some structure/API for introducing xkey tags into the response? Also, do we have support from restbase/cache purging code in purgeWebCache() for xkeys?

I plan to introduce such infrastructure into core, modeled around this use case and the per-language-url use case. I have some ideas, but I'm not yet sure what exactly it will look like. What makes this a bit tricky is that the interface needs to also work if there is no xkey support. Since purging based on tags is quite different from purging based on URIs, it's not easy to invent a nice service interface for this. I'm thinking about it.

As to RESTbase: once MediaWiki sends out xkey purges, all RESTbase needs to do is to tag responses with the appropriate xkey headers. MW and RB need to agree on the keys (resource URIs), but that's it.

@Smalyshev is it intentional that this is now assigned to you? Or was this copied from the parent task? I'd like to at least be involved in designing the service interface we need for this in core.

This comment was removed by daniel.

Hmm I don't know why it assigned to me :) I think it was inherited from parent. Feel free to reassign if you're working on it.

once MediaWiki sends out xkey purges, all RESTbase needs to do is to tag responses with the appropriate xkey headers

Not sure I get this part - right now as far as I can see purges are managed by CdnCacheUpdate which gets a bunch of URLs. The URLs don't have xkeys in the API - so do we change the URLs to have xkeys embedded (how?) or do we make RESTbase somehow discover xkeys from URLs (would like to avoid that as it means information about xkey/URL relationship needs to be in two places)? Or do we change the API for CdnCacheUpdate to accept xkeys too?

Also, I understand xkey purges are done differently: https://github.com/varnish/varnish-modules/blob/master/docs/vmod_xkey.rst
So I guess we'll need RESTbase patch for this?

Not sure I get this part - right now as far as I can see purges are managed by CdnCacheUpdate which gets a bunch of URLs.

This will no longer be the case when we use xkey throughout the system. CdnCacheUpdate would then purge a key (or a list of keys), nut URLs. Instead, when serving the content of the URLs, we have to unclude the key in the response, so we can later use it for purging.

So, whenever RESTbase serves a URL, it has to know which key(s) to should put into the header. That's conceptually the same as declaring which resources (keys) your response depends on.

I don't see a way to do any of this with the existing CdnCacheUpdate interface. That's why I said Since purging based on tags is quite different from purging based on URIs, it's not easy to invent a nice service interface for this.

My current Idea is that for now, we support only one kind of resource that things can depend on: page content, identified by a title. When purging, we tell the CDN manager what title we want to purge. It can then purge a key, or determine a list of URLs to purge - that would be encapsulated in the service that knows about the capabilities of the CDN. When serving a cacheable resources, we'd tell the same service which resources (titles) our response depends on, and give it the opportunity to add any information it needs to the HTTP response.

So, whenever RESTbase serves a URL, it has to know which key(s) to should put into the header.

But restbase doesn't serve wikidata URLs, does it?

I don't see a way to do any of this with the existing CdnCacheUpdate interface.

Well, this interface and adjacent ones like getCacheableUrls() used in EntityDataRequestHandler.php are now being used when we wanted to purge all resource's dependent pages. Now we'll do it via xkeys, so I wonder how. Will we drop support for getCacheableUrls()? Replace it with some other API?

we support only one kind of resource that things can depend on: page content, identified by a title

I.e. the key would be page title? That works, but we'd need to define encoding - urlencodes, space vs. _, etc. (and I wonder if we need to also consider language variants here).

Otherwise the design sounds good, but I think while we're using the title, we should not create dependencies on it being actual title (if possible), so later if we want other keys we could add them.

Closing this as declined for now as we covered the underlying issue with T128486.