We would like to discuss various caching approaches for the Wikidata Query Service.
Direct Wikipedia Usage
- Graph extension allows direct querying of the service to produce these examples. In this case, queries are ran from the Graphoid service, or directly from browsers if the graph is interacted with.
- Graph design - when graph is being designed in the graph sandbox, a query will be made on each keystroke.
Wikidata tools & bots
Proposals and Ideas
- Invalidate cache on relevant item change - this is an ideal scenario, but it is highly unlikely, as we would have to track all items that participated in the query, plus evaluate all new items if they would match the original query.
- Cache all responses in Varnish for a reasonable duration depending on the server load - e.g. 1 hour or 1 day
- Invalidate cache by doing the same request but with an extra URL parameter or an extra header, e.g. &refresh=1 or Refresh: 1.
- To prevent DOS, ignore refresh parameter/header if the cached response is less than a minute old
- VCL will ignore the extra parameter when constructing cache key
- Do not cache unless some request specifies that it is ok to cache
- We need to be clear why non-caching should be the default behavior
Please discuss and vote on what the default behavior should be.
Necessary in any case for caching:
- move from misc cluster to one that is for higher request rate
- fix that varnish is not caching because of chunked encoding
- fix response headers so that caching is allowed for whatever duration is decided