Page MenuHomePhabricator

siteinfo api calls should be cached for N minutes on the caching layer
Open, LowPublic

Description

Apparently about 5% of all requests to the api, and about 35% of all requests from the mobile apps are for siteinfo queries.

Various things depend on querying siteinfo (which is probably changed), notably wikibase, the mobile apps, various other services.

This was particularly in display during the incident following the passing of Kobe Bryant, when all such queries grind to a halt and thus slowed down everything else.

Currently we respond to siteinfo queries by setting

cache-control: private, must-revalidate, max-age=0

meaning that (for instance) everyone who opens the mobile application will fetch siteinfo data from the api backend.

We should at the very least cache the result for a few minutes.

An alternative option would be to generate siteinfo on the servers every N minutes and serve it statically, if possible.

Event Timeline

Joe triaged this task as High priority.Feb 4 2020, 8:07 AM
Joe created this task.

I see a "time": "<current timestamp>" in the output by siteinfo. I don't know if and how anyone uses that timestamp, but the current time is also available via the curtimestamp parameter with any query (and this is the way suggested by the documentation for action=edit), so after a deprecation period the timestamp in the siteinfo output can probably be safely removed. But it is something to keep in mind when caching the output.

Caching in the caching layer might have been helpful for these specific requests, but would that merely have deferred the problem to the next request to some other module? Caching inside MediaWiki wouldn't have had any effect, as far as I can tell. There doesn't seem to have been any slowdown at the PHP level, based on logged response times for action=query&meta=siteinfo queries.

Time (UTC)Hitsp50p75p95p99
2020-01-26 1186330121.024.034.055.0
2020-01-26 1287878621.024.030.047.0
2020-01-26 1389887821.024.030.046.0
2020-01-26 1478586021.024.030.047.0
2020-01-26 1584145021.024.030.050.0
2020-01-26 1680556621.024.030.047.0
2020-01-26 1789881121.024.030.049.0
2020-01-26 1897236621.024.031.057.0
2020-01-26 19105950221.024.031.057.0
2020-01-26 20140755920.023.029.041.0
2020-01-26 21109033221.024.032.064.0
2020-01-26 22100015420.023.029.045.0
2020-01-26 2394845921.024.030.066.0
2020-01-27 0088209621.024.030.051.0
2020-01-27 0186013521.024.029.046.0
2020-01-27 0287410121.024.030.048.0
2020-01-27 0381745121.024.030.047.0
2020-01-27 0480577821.024.030.049.0
2020-01-27 0584755621.024.030.050.0
2020-01-27 0686904921.024.030.047.0
2020-01-27 0785324121.024.029.045.0
2020-01-27 0884683221.024.030.046.0
2020-01-27 0981832321.024.030.047.0

(queried from hadoop/beeline with select day, hour, count(*) as ct, percentile( backend_time_ms, 0.5 ) as p50, percentile( backend_time_ms, 0.75 ) as p75, percentile( backend_time_ms, 0.95 ) as p95, percentile( backend_time_ms, 0.99 ) as p99 from event.mediawiki_api_request WHERE year=2020 AND month=1 AND (day=26 AND hour>10 or day=27 and hour<10) AND params['action'] = 'query' and params['meta'] = 'siteinfo' and isnull(params['list']) and isnull(params['prop']) group by day, hour order by day, hour limit 100;, then lightly reformatted)


As for actually implementing this: HTTP caching in the Action API is, for historical reasons, somewhat lackluster. Currently for action=query&meta=siteinfo, the client must specify maxage and/or smaxage parameters to request cached results.

We can work around that, but for query submodules in particular it'll be somewhat roundabout: we'll have to add something to ApiQueryBase to return a "default" for when maxage/smaxage aren't specified, then have ApiQuery combine these for all used modules much like it already does for the caching mode.

I see a "time": "<current timestamp>" in the output by siteinfo.

Meh. We might do well to keep it as an indication of when exactly the request was cached.

[cut]

As for actually implementing this: HTTP caching in the Action API is, for historical reasons, somewhat lackluster. Currently for action=query&meta=siteinfo, the client must specify maxage and/or smaxage parameters to request cached results.

This is great - I wasn't aware of those action api parameters, and in fact https://en.wikipedia.org/w/api.php?format=json&formatversion=2&errorformat=plaintext&action=query&meta=siteinfo&maxage=86400&smaxage=86400 gets correctly cached for one day.

We can then ask our mobile apps developers to change the calls in the application to set a sensible caching time. This will reduce the calls to the API layer and also speed up the applications for people in Europe and Asia.

To be clear, I think what @Anomie described is a perfectly valid way of handling caching in this case - it leaves it as a option to the client, although it does so in a very cache-inefficient way (but that's a common issue with the action API caching, as stated above).

I think we should check external users of siteinfo, and have those switch to a cachable URL, while leaving internal consumers untouched - they should cache the result themselves after having fetched it from the application layer.

As of now, the only outstanding case seem to be the mobile applications, as tracked in T245033.

@Pchelolo feel free to close this task if you see no other outstanding client.

The Api classes in MediaWiki also have a way to enable caching by default, via setCacheMaxAge(). This is used by ApiOpenSearch.

If the siteinfo data is safe to cache, should we enable that by default in MW core? (Doing so currently means any manual smaxage query parameter will be ignored, but that's probably fine?)

I just looked ApiQuerySiteinfo and ApiMain, and noticed something else:

  1. It looks like use of smaxage is ignored for API modules that return non-public in their getCacheMode() This makes sense but also means it might not work for this query.
  2. ApiQuerySiteinfo::getCacheMode is public by default (though without any maxage), but if you set prop=interwikimap it becomes private because "this depends on user language".

Should we set setCacheMaxAge() by default in ApiQuerySiteinfo?

Are the mobile apps are using this parameter, and do I understand correctly that those won't be CDN-cacheable? If so, is there a way around that?

Should we set setCacheMaxAge() by default in ApiQuerySiteinfo?

No, because that would incorrectly set the cache max age for queries that include siteinfo and other query submodules too. As I said earlier,

We can work around that, but for query submodules in particular it'll be somewhat roundabout: we'll have to add something to ApiQueryBase to return a "default" for when maxage/smaxage aren't specified, then have ApiQuery combine these for all used modules much like it already does for the caching mode.

I have verified that services running internally are caching siteInfo responses in-process for each domain. Given that we're not going via Varnish, but directly to the app servers, I don't think we can do better than that.

There is already T245033 to track setting the parameters to the query in the apps.

The only outstanding question left in this ticket is whether it's possible to set a non-zero default max-age in MW, which @Anomie declined in T244204#5894471

I will be bold and close this ticket as Declined, feel free to reopen if you think issue has not been addressed in full.

The only outstanding question left in this ticket is whether it's possible to set a non-zero default max-age in MW, which @Anomie declined in T244204#5894471

I declined calling setCacheMaxAge() from ApiQuerySiteinfo. I didn't decline setting a non-zero default max-age in the more roundabout manner I described in the quote at the end of that comment.

Anomie lowered the priority of this task from High to Low.EditedFeb 27 2020, 8:17 PM

It sounds like the original problem behind this task is being solved via T245033, so I'm going to lower the priority of this task and change the tags around. If anyone disagrees with the changes I made, feel free to undo them.

There's still the possibility of having meta=siteinfo apply a non-zero default max-age, which we may as well use this task to track instead of closing this and opening a new one. If someone wants to take that on, you'd need to do something like this:

  • Add a method (maybe "getCacheMaxAge") to ApiQueryBase, similar to the existing getCacheMode() method. The semantics of the new method would be that it returns an array (possibly associative) of two integers representing the default maxage and smaxage. The implementation in ApiQueryBase would return zeros.
  • Add logic to ApiQuery that handles the new method along the same lines as it currently handles getCacheMode(). Specifically, it would merge two modules' values by keeping the minimum of maxage and smaxage, and in the end call setCacheMaxAge() to apply the resulting ages.
    • You'd have to figure out how exactly to handle ApiPageSet. As with getCacheMode(), it should pass through from the generator if a generator was used. Otherwise, it should probably return zeros if it actually has any pages, but in the do-nothing case it should somehow not contribute to the merging.
    • If no modules at all were executed (and ApiPageSet hit the do-nothing case too), it should probably go ahead with zeros.
  • Implement the method for ApiQuerySiteinfo to return appropriate non-zeros.
    • But it should return 0s if siprop includes dbreplag, as that data goes stale quickly.
  • Add tests to ApiQueryTest for at least the following cases:
    • No modules or titles: Results in 0s for max-age and s-max-age.
    • Only meta=siteinfo, no titles: Results in siteinfo's values for max-age and s-max-age.
    • meta=siteinfo|userinfo, no titles: Results in 0s thanks to meta=userinfo.
    • meta=siteinfo and titles=Test: Results in 0s thanks to ApiPageSet.

(Maybe not really good first task, but we don't seem to have a "good tenth task" for things that are easy but not completely trivial...)

Right, hence removing good first task per its definition. :)

Aklapper added a subscriber: Pchelolo.

Removing inactive assignee (Platform Engineering: Please unassign tasks of previous team members.)