Page MenuHomePhabricator

Action API should permit CDN cache when generator is used (MobileFrontend search is slow)
Open, Needs TriagePublic

Description

I'm noticing numerous common API queries, including those used by search suggestions on MobileFrontend, to not enjoy any form of CDN caching.

  1. View https://en.m.wikipedia.org/wiki/Main_Page
  2. Open network devtools.
  3. Type letter "B".

https://en.m.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageprops%7Cpageprops%7Cpageimages%7Cdescription&generator=prefixsearch&ppprop=displaytitle&piprop=thumbnail&pithumbsize=80&pilimit=15&redirects=&gpssearch=B&gpsnamespace=0&gpslimit=15

Latency is 254ms (!)

Cache headers
cache-control: private, must-revalidate, max-age=0
x-cache: cp3050 miss, cp3058 pass
x-cache-status: pass

Compared to the equivalent query issued by Vector:
https://en.wikipedia.org/w/api.php?action=opensearch&format=json&formatversion=2&search=B&namespace=0&limit=10

Latency is just 11ms.

Cache headers
cache-control: max-age=10800, s-maxage=10800, public
x-cache: cp3058 miss, cp3058 hit/135
x-cache-status: hit-front

To narrow this down, I have confirmed this to be the case even when:

  • client is logged out and has no session (e.g. private browsing), although CDN cache should work for search suggestions even when logged-in (like it always has for OpenSearch as used by Vector, etc.).
  • when checking appserver response directly, it's not a bug in Varnish or other routing/traffic layers.
  • when querying without any of the more complex extensions like PageImages etc. Even a plain prefixsearch with generator enabled for basic pageinfo, reproduces the same bug locally.
 $ curl -i 'http://localhost:8080/w/api.php?action=query&format=json&formatversion=2&generator=prefixsearch&gpssearch=B' 
HTTP/1.1 200 OK
Server: Apache/2.4.38 (Debian)
Cache-Control: private, must-revalidate, max-age=0
..
{"batchcomplete":true,"query":{"pages":[{"pageid":8,"ns":0,"title":"Bar","index":1}]}}
 $ curl -i 'http://localhost:8080/w/api.php?action=opensearch&format=json&formatversion=2&search=B&namespace=0&limit=10'
HTTP/1.1 200 OK
Server: Apache/2.4.38 (Debian)
Cache-Control: max-age=1200, s-maxage=1200, public
..
["B",["Bar"],[""],["http://mw.localhost:8080/wiki/Bar"]]

Event Timeline

I added some rudimentary instrumentation to narrow down where it might be going wrong:

--- a/includes/api/ApiMain.php
+++ b/includes/api/ApiMain.php
@@ -793,6 +793,7 @@ class ApiMain extends ApiBase {
        }
 
        wfDebug( __METHOD__ . ": setting cache mode $mode" );
+       header( "X-Log-cachemode-4-set: " . $mode );
        $this->mCacheMode = $mode;
    }
 
diff --git a/includes/api/ApiQuery.php b/includes/api/ApiQuery.php
index dc5b4d8b8cd..8bc18b7c1ee 100644
--- a/includes/api/ApiQuery.php
+++ b/includes/api/ApiQuery.php
@@ -620,10 +620,12 @@ class ApiQuery extends ApiBase {
        }
 
        $cacheMode = $this->mPageSet->getCacheMode();
+       header( 'X-Log-cachemode-1-pageset: ' . $cacheMode );
 
        // Execute all unfinished modules
-       foreach ( $modules as $module ) {
+       foreach ( $modules as $k => $module ) {
            $params = $module->extractRequestParams();
+           header( "X-Log-cachemode-2-$k-" . get_class($module) . ': ' . $module->getCacheMode( $params ) );
            $cacheMode = $this->mergeCacheMode(
                $cacheMode, $module->getCacheMode( $params ) );
            $module->execute();
@@ -631,6 +633,7 @@ class ApiQuery extends ApiBase {
        }
 
        // Set the cache mode
+       header( "X-Log-cachemode-3-set: " . $cacheMode );
        $this->getMain()->setCacheMode( $cacheMode );
 
        // Write the continuation data into the result

This yields:

curl -i 'http://localhost:8080/w/api.php?action=opensearch&format=json&formatversion=2&search=B&namespace=0&limit=10'
X-Log-cachemode-4-set: anon-public-user-private
Cache-Control: max-age=1200, s-maxage=1200, public

curl -i 'http://localhost:8080/w/api.php?action=query&format=json&formatversion=2&generator=prefixsearch&gpssearch=B&gpsnamespace=0&gpslimit=15' 
X-Log-cachemode-1-pageset: public
X-Log-cachemode-3-set: public
X-Log-cachemode-4-set: anon-public-user-private
Cache-Control: private, must-revalidate, max-age=0

$ curl -i 'http://localhost:8080/w/api.php?action=query&format=json&formatversion=2&generator=prefixsearch&gpssearch=B' 
X-Log-cachemode-1-pageset: public
X-Log-cachemode-3-set: public
X-Log-cachemode-4-set: anon-public-user-private
Cache-Control: private, must-revalidate, max-age=0

That rules out pretty much all middle layers where I suspected the cause would be.

Aklapper renamed this task from Action API should permit CDN cachje when generator is used (MobileFrontend search is slow) to Action API should permit CDN cache when generator is used (MobileFrontend search is slow).Jan 26 2022, 1:57 AM

@Jdlrobson The above may be impacting mobile UX. Feel free to radar/track away, this is just an FYI. Though if you happen to remember a time when this wasn't slow/uncached, that'd help narrow it down.

Jdlrobson added a subscriber: ovasileva.

Thanks for the ping. I am not aware of any changes that might have led to this slow down. Frontend hasn't been touched since 2018. Team doesn't work on backend, so perhaps some changes have occurred there?

@ovasileva this might be good motivation to prioritize the work on mobile to switch it to the new search widget.

Thanks for the ping. I am not aware of any changes that might have led to this slow down. Frontend hasn't been touched since 2018. Team doesn't work on backend, so perhaps some changes have occurred there?

@ovasileva this might be good motivation to prioritize the work on mobile to switch it to the new search widget.

@Jdlrobson - do we know how the UX/frontend might be affected here?

If the API is slower, then searching will feel slower and it will take longer for results to display.

What does it mean to cache a generator query response? The next time you call it, chances are the generated page set will be different.

The action API mostly leaves the decision to cache or not cache with the client; when you use maxage/smaxage, the result will be cacheable:

curl -i '<wiki>/w/api.php?action=query&format=json&formatversion=2&generator=prefixsearch&gpssearch=B&maxage=100&smaxage=100' 
HTTP/1.1 200 OK
Cache-control: s-maxage=100, max-age=100, public
Expires: Tue, 04 Feb 2025 08:27:47 GMT

although CDN cache should work for search suggestions even when logged-in

See T97096: API requests are not cached for logged-in users unless uselang is set explicitly about that.