API requests are not cached for logged-in users unless uselang is set explicitly
Open, HighPublic

Description

The API defaults to using the user language, which means the response depends on the identity of the user and cannot be cached publicly for logged-in users (unless uselang=content or something similar is set). Given that most API modules don't have any language-dependent part, this is unintuitive and confusing.

ori created this task.Apr 24 2015, 12:12 AM
ori updated the task description. (Show Details)
ori raised the priority of this task from to High.
ori added projects: TextExtracts, Performance.
ori added subscribers: ori, faidon, Prtksxna.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2015, 12:12 AM
MaxSem renamed this task from ApiQueryPageImages responses should be cacheable to uselang=user breaks caching.Apr 26 2015, 8:50 PM
MaxSem edited projects, added MediaWiki-API; removed TextExtracts.
MaxSem set Security to None.
MaxSem added a subscriber: Anomie.
MaxSem added a subscriber: MaxSem.

That's a general API regression: there's "ApiMain::setCacheMode: downgrading cache mode 'public' to 'anon-public-user-private' due to uselang=user" in debug log due to ApiMain::setCacheMode().

Anomie added a comment.EditedApr 26 2015, 10:57 PM

The PopUps extension should be specifying maxage and/or smaxage parameters and a language other than "user" if it wants cacheability.

I tried to have the default not be "user", but people complained. See https://lists.wikimedia.org/pipermail/wikitech-l/2014-November/079287.html and https://gerrit.wikimedia.org/r/#/c/171050/. Also an IRC discussion with Ori on 2014-11-04, which unfortunately is before <#mediawiki-core> had a bot. I can post it from my own logs if Ori doesn't mind.

Ricordisamoa updated the task description. (Show Details)May 1 2015, 11:34 PM
Ricordisamoa added a project: PageImages.
Ricordisamoa added a subscriber: Ricordisamoa.
Tgr renamed this task from uselang=user breaks caching to API requests are not cached for logged-in users unless uselang is set explicitly.Jun 6 2015, 2:26 AM
Tgr added a subscriber: Tgr.Jun 6 2015, 2:36 AM

I believe this should be revisited. That API requests are never cached for a module which has nothing whatsoever to do with languages, unless the caller sets a language, is extremely unintuitive and means that most API requests will have suboptimal performance. Also, returning the wrong language is obvious and will force the developer to fix the request parameters, but having suboptimal performance is non-obvious and chances are no one will notice. (This was caught after one year, and only when it contributed to a site outage.) With our recent focus on performance, this choice of default is IMO indefensible. At a mininum, API modules which are heavily involved with languages should explicitly set their default to user and everything else should default to content.

In T97096#1343001, @Tgr wrote:

At a mininum, API modules which are heavily involved with languages should explicitly set their default to user and everything else should default to content.

Having different defaults for different modules would be painful, particularly here where the ContextSource is set up well before the module being used is determined. And when you get into query submodules the situation gets even worse.

Tgr added a comment.Jun 29 2015, 10:11 PM

What would be the way to discuss defaulting to contentlang, then? Make an RfC about it?

Yeah, an RfC would probably be the way to go.

Change 222287 had a related patch set uploaded (by Prtksxna):
Add maxage, s-maxage and uselang parameters in API call

https://gerrit.wikimedia.org/r/222287

Prtksxna added a comment.EditedJul 2 2015, 12:50 PM

I am getting inconsistent results based on whether I test using Chrome or curl. I get private and must-revalidate, along with max-age on Chrome, but when I use curl I get maxage, s-maxage and public. Note that I am using uselang=content for these tests.

From the reading I did, I understand that having private and must-revalidate would essentially mean that maxage and s-maxage would be ignored. Is that right?


Chrome—


curl—
1​~
2​▶ curl 'http://localhost:8080/w/api.php?action=query&format=json&prop=extracts%7Cpageimages%7Crevisions&formatversion=2&redirects=true&exintro=true&exsentences=2&explaintext=true&piprop=thumbnail&pithumbsize=300&rvprop=timestamp&titles=Blueprint&smaxage=300&maxage=300&uselang=content' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://localhost:8080/wiki/Main_Page' -H 'X-Requested-With: XMLHttpRequest' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36' --compressed -v
3​* Adding handle: conn: 0x7ff609804000
4​* Adding handle: send: 0
5​* Adding handle: recv: 0
6​* Curl_addHandleToPipeline: length: 1
7​* - Conn 0 (0x7ff609804000) send_pipe: 1, recv_pipe: 0
8​* About to connect() to localhost port 8080 (#0)
9​* Trying ::1...
10​* Trying 127.0.0.1...
11​* Connected to localhost (127.0.0.1) port 8080 (#0)
12​> GET /w/api.php?action=query&format=json&prop=extracts%7Cpageimages%7Crevisions&formatversion=2&redirects=true&exintro=true&exsentences=2&explaintext=true&piprop=thumbnail&pithumbsize=300&rvprop=timestamp&titles=Blueprint&smaxage=300&maxage=300&uselang=content HTTP/1.1
13​> Host: localhost:8080
14​> Accept-Encoding: deflate, gzip
15​> Accept: application/json, text/javascript, */*; q=0.01
16​> Referer: http://localhost:8080/wiki/Main_Page
17​> X-Requested-With: XMLHttpRequest
18​> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36
19​>
20​< HTTP/1.1 200 OK
21​< Date: Thu, 02 Jul 2015 12:35:37 GMT
22​* Server Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.9 is not blacklisted
23​< Server: Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.9
24​< X-Content-Type-Options: nosniff
25​< Cache-control: s-maxage=300, max-age=300, public
26​< Expires: Thu, 02 Jul 2015 12:40:37 GMT
27​< X-Powered-By: HHVM/3.6.1
28​< X-Frame-Options: DENY
29​< Vary: Accept-Encoding
30​< Transfer-Encoding: chunked
31​< Content-Type: application/json; charset=utf-8
32​<
33​* Connection #0 to host localhost left intact
34​{"batchcomplete":true,"query":{"pages":[{"pageid":2,"ns":0,"title":"Blueprint","extract":"This is content, now isn't it?","revisions":[{"timestamp":"2015-06-30T04:31:01Z"}]}]}}%

Anomie added a comment.Jul 2 2015, 2:12 PM

I am getting inconsistent results based on whether I test using Chrome or curl.

More likely logged-in versus logged-out, and when you're logged-in you're using a user account that has the rights necessary to see revision-deleted content (those being any of deletedhistory, deletedtext, suppressrevision, or viewsuppressed). prop=revisions sets 'private' mode in that case to avoid revision-deleted content being accidentally included in caches somehow.

More likely logged-in versus logged-out

Yep, that was it.

when you're logged-in you're using a user account that has the rights necessary to see revision-deleted content (those being any of deletedhistory, deletedtext, suppressrevision, or viewsuppressed). prop=revisions sets 'private' mode in that case to avoid revision-deleted content being accidentally included in caches somehow.

I see, and does private imply that there is effectively no caching for that request?

Anomie added a comment.Jul 2 2015, 7:16 PM

I see, and does private imply that there is effectively no caching for that request?

Short answer: no, some caching is allowed.

Long answer:

The header in question is

Cache-control: private, must-revalidate, max-age=300

"private" means that the response may not be cached in a shared cache, such as varnish. The user's browser may still cache it (for the individual user), subject to other conditions. Same goes for any other "private" (i.e. non-shared) cache.

"must-revalidate" means that the cache (the browser here) can't keep the response after it's considered stale, and "max-age=300" says it's considered stale after 5 minutes.

Understood. Thanks for taking the time to explain this @Anomie

So, the patch I submitted will have some positive effect! Mind reviewing it?

Would it be more performant to have to separate API calls, one that gets the text extract and image, and another that gets the revisions? That way at least the first call could get cached at varnish?

Anomie added a comment.Jul 2 2015, 8:06 PM

That depends on whether the increased cacheability outweighs the overhead of making two requests; I have no idea. And for users who don't have advanced rights, the prop=revisions bit should also be cacheable already.

Tgr added a comment.Jul 2 2015, 9:31 PM

I doubt one uncacheable call and two cacheable ones would be an improvement over a single uncacheable call. In general, I wouldn't worry about cache-control: private for admins (or even all logged-in users). They are a tiny part of all requests sent so the strain on the servers is negligible. They will see the cards slightly slower, but I doubt that's a huge deal.

In the longer term, maybe we can add a noprivate=1 flag to the API to disable all non-public fields in the output and use proper caching.

Change 222287 merged by jenkins-bot:
Add maxage, s-maxage and uselang parameters in API call

https://gerrit.wikimedia.org/r/222287

Prtksxna closed this task as Resolved.Jul 4 2015, 2:44 AM
Prtksxna claimed this task.
Tgr updated the task description. (Show Details)Jul 4 2015, 4:49 AM
Tgr removed a project: PageImages.
Tgr reopened this task as Open.EditedJul 4 2015, 4:51 AM

Not really resolved, Popups has just been hardcoded to work around it. (I might have made a mess by not separating the Popups issue and the general issue; sorry about that. The general issue is more important and still needs addressing, though.)

Prtksxna removed Prtksxna as the assignee of this task.Jul 6 2015, 9:21 AM
jeremyb added a subscriber: Pharos.
jayvdb added a subscriber: jayvdb.Aug 12 2015, 9:11 AM

Is this a regression? i.e. were API requests cached by default previously?

If so, when was it last working correctly?

Is this a regression? i.e. were API requests cached by default previously?

API requests in general aren't marked as publicly cacheable by default, but the maxage and smaxage parameters enable it. Individual actions can default to public caching.

Previously, some API modules flagged uncacheable because they know they're depending on the user language (and some even recognized the "uselang" parameter (everything else did too, these just explicitly did)). Others just ignored a user language dependency, allowing for bad caching. These dependencies can be hidden deep in MediaWiki code or come from hooks or extensions, so it's not obvious from just looking at the API module itself.

Ib14c00df made the whole API recognize "uselang", and made the default be the wiki's content language so public cacheability would be preserved by default, and made uselang=user force no public caching to avoid trouble from these hidden dependencies. But people complained about that default not matching index.php, so Idc24bfc switched the default to the user's language.

So it's a bugfix or no-change for modules that somehow depend on the user language and a minor break for modules that don't, with no simple way to comprehensively define those sets.

Gilles added a subscriber: Gilles.Oct 24 2016, 10:45 AM