Page MenuHomePhabricator

Inconsistent caching/staleness of mobile-html responses for certain articles
Closed, ResolvedPublic

Description

There seems to be some strangeness in the way that mobile-html responses are cached. (unless I'm misunderstanding something)
When I make edits, they are not always reflected right away when I get the mobile-html contents again. But I can't figure out for which types of articles this is happening

Here are the steps that I'm taking:

  • In a console, make a request to fetch a mobile-html page, e.g.:

curl -I https://en.wikipedia.org/api/rest_v1/page/mobile-html/Steve_Harrington

  • Observe the age header returned in the response.
  • Make an edit to that page in a browser window.
  • In the console, make the curl request again.
  • Observe that it's still returning the old content, with a cache status of hit-front and the age increasing. It also seems to be served by the same restbaseXXX server every time, no matter how many times I repeat the request.

This is manifesting itself in the Android app as not being able to see recently-made edits, regardless of how many times we refresh.

What could be going on? Initially I thought it might be an issue with articles with a space/underscore in the name, but other similar articles work differently, such as Star_Trek, where the cache status is nearly always pass, and occasionally hit-local.

Event Timeline

JoeWalsh triaged this task as High priority.Apr 9 2020, 2:29 PM
JoeWalsh added a project: Platform Engineering.
JoeWalsh moved this task from Upcoming to Tracking on the Product-Infrastructure-Team-Backlog board.

This definitely needs investigation. I've started looking into it and there's definitely something wrong. Not sure what yet, but something is wrong.

@Pchelolo Any update on this ticket? It (along with T250209) has been blocking Android from releasing for 3 weeks.

I have retested an issue with Steve_Harrington and it seems to be working correctly for me. I didn't find a good quality edit I could make to the article, so I made a null edit, which follows a bit different procedure for purging, but it worked correctly. I've poked around articles in my user space on multiple wikis and again - purged and rerendered correctly.

What's happening with Star_Trek article is interesting though: curl -I https://en.wikipedia.org/api/rest_v1/page/mobile-html/Star_Trek has usual cache-control headers and doesn't seem to be any special, but indeed is not cached by Varnish, at least according the age or x-cache headers. Tagging Traffic to have a look

The specific issues described in this ticket should have now been fixed (see T249325).

What's happening with Star_Trek article is interesting though: curl -I https://en.wikipedia.org/api/rest_v1/page/mobile-html/Star_Trek has usual cache-control headers and doesn't seem to be any special, but indeed is not cached by Varnish, at least according the age or x-cache headers. Tagging Traffic to have a look

At the Varnish (that is to say, frontend) layer the object does not get cached because it is too big. We currently skip the cache for objects larger than 256K, and Star_Trek is in the 500 ballpark. It should however get cached by ATS and indeed that seems to be the case.

$ curl -v https://en.wikipedia.org/api/rest_v1/page/mobile-html/Star_Trek 2>&1 | grep x-cache:
< x-cache: cp3058 hit, cp3050 pass

Try curl'ing a few times if not, pass traffic is not c-hashed to the same cache backend. The backend is chosen round-robin.

Seems like all the mysteries here have been resolved. Additinally, I verified we're purging mobile-html in the same cases when we purge the old mobile-sections, so we should be good.