Page MenuHomePhabricator

REST API URL canonization
Open, Needs TriagePublic

Description

Cacheability is a major goal for the MediaWiki REST API, so we want to avoid GET requests which can be made in multiple ways and would thus split the cache. These include:

  1. requests with multiple query parameters (or other unordered parameter sets, e.g. a comma-separated list of filters) where ordering is arbitrary
  2. non-canonical query continuation (i.e. continuation parameters provided by the user which do would not come during a real continuation sequence)
  3. parameters which are in some non-canonical format (e.g. non-NFC Unicode strings, namespace aliases, nonstandard title capitalization, space vs. underscore)
  4. requests which refer to an article via a redirect

We should have a strategy for handling these (cached 301/302? reject? serve but do not cache?), preferably provided by the framework.

See also:

Event Timeline

Tgr created this task.May 26 2019, 11:27 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 26 2019, 11:27 AM
Tgr added a comment.May 26 2019, 11:28 AM

non-canonical query continuation

I believe RESTBase handles that by making continuation parameters opaque (signing them with an application secret).

Tgr updated the task description. (Show Details)May 26 2019, 1:19 PM
Anomie added a subscriber: Anomie.May 29 2019, 2:46 PM

non-canonical query continuation

I believe RESTBase handles that by making continuation parameters opaque (signing them with an application secret).

The Action API just defines them as opaque, without extra complexity to try to prevent developers from shooting themselves in the foot. Although that definition is mostly to make it so that we don't have to consider it a breaking change any time we need to change the format of the value.

Tgr added a comment.May 29 2019, 2:49 PM

Shooting yourself in the foot is fine, but as soon as caching is involved you'll shoot traffic ops in the foot as well, so IMO it's worth making continuation untamperable here.

Are you also going to prevent people from specifying unrecognized parameters, misusing Cache-Control headers, and so on? There are a lot of things you could do to screw up caching, it's unlikely to be worth the complexity of trying to preemptively prevent all of them.

Tgr added a comment.May 29 2019, 8:11 PM

I'd at least make URLs with unrecognized parameters use a short cache expiry. The problem with continuation is that you can't really tell whether the parameter is canonical without using some sort of cryptographic proof.