Page MenuHomePhabricator

Incorrect language variant returned for PCS endpoints
Closed, ResolvedPublicBUG REPORT

Description

Repro Steps

Run this command a number of times and observe the title of the page in the output. Using only the title here for simplicity, the whole page has the same problem.

curl -sv -H "Accept-Language: zh-hant" -H "Cache-Control: no-cache" https://zh.wikipedia.org/api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85 | egrep -ohi '(?:<h1 data-id="0" class="pcs-edit-section-title">)[^<]+'

Expected results

The language is zh-hant every time

Actual results

Sometimes the language is zh-hans

Here's an example request with the correct variant (only including the relevant headers):

> GET /api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85 HTTP/2
> Host: zh.wikipedia.org
> User-Agent: curl/7.64.1
> Accept: */*
> Accept-Language: zh-hant
> Cache-Control: no-cache
...
< content-type: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Mobile-HTML/1.2.1"
< vary: Accept-Language, Accept-Encoding
< content-language: zh-hant
< date: Tue, 07 Apr 2020 11:48:05 GMT
< cache-control: s-maxage=1209600, max-age=0, must-revalidate
< access-control-allow-methods: GET,HEAD
< referrer-policy: origin-when-cross-origin
< content-location: https://zh.wikipedia.org/api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85
< x-request-id: 73184458-76e5-47bc-bb1d-38167a6a2a30
< server: restbase1027
< etag: W/"59036991/a46ba570-78c5-11ea-8284-a649b8c658ff"
< age: 4
< x-cache: cp1081 miss, cp1087 pass
< x-cache-status: pass
< server-timing: cache;desc="pass"
< strict-transport-security: max-age=106384710; includeSubDomains; preload

And one with the incorrect variant:

> GET /api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85 HTTP/2
> Host: zh.wikipedia.org
> User-Agent: curl/7.64.1
> Accept: */*
> Accept-Language: zh-hant
> Cache-Control: no-cache
...
< content-type: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Mobile-HTML/1.2.1"
< vary: Accept-Language, Accept-Encoding
< content-language: zh
< date: Tue, 07 Apr 2020 11:45:34 GMT
< cache-control: s-maxage=1209600, max-age=0, must-revalidate
< access-control-allow-methods: GET,HEAD
< referrer-policy: origin-when-cross-origin
< content-location: https://zh.wikipedia.org/api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85
< x-request-id: 889b90a3-29de-4f8b-9589-296275b69a99
< server: restbase1020
< etag: W/"59036991/4af063a0-78c5-11ea-a66a-a2018dda4efe"
< age: 7
< x-cache: cp1083 miss, cp1087 pass
< x-cache-status: pass
< server-timing: cache;desc="pass"
< strict-transport-security: max-age=106384710; includeSubDomains; preload

When running it against the mobileapps service labs instance, I see the correct title every time:

curl -sv -H "Accept-Language: zh-hant" -H "Cache-Control: no-cache" https://mobileapps.wmflabs.org/zh.wikipedia.org/v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85 | egrep -ohi '(?:<h1 data-id="0" class="pcs-edit-section-title">)[^<]+'

Additional info

I'm seeing the same problem on mobile-sections-lead, so it appears the issue is not limited to mobile-html:

curl -sv -H "Accept-Language: zh-hant" -H "Cache-Control: no-cache" https://zh.wikipedia.org/api/rest_v1/page/mobile-sections-lead/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85  | egrep -ohi '"displaytitle":"[^"]+"'

Original bug report:

This can be reproduced on the Wikipedia Android beta app.

Please note that this issue does not happen on every zh.wiki articles.

Steps to reproduce

  1. Open article: 嚴重特殊傳染性肺炎疫情 in Traditional Chinese
  2. Refresh the page after it's fully loaded.
  3. You will see the article's variant is changed which includes the article content, ToC data, and footer.

DEMO: https://www.youtube.com/watch?v=_gZ8lMg5O60

Event Timeline

Restricted Application changed the subtype of this task from "Deadline" to "Bug Report". · View Herald Transcript
Restricted Application added subscribers: Stang, Aklapper. · View Herald Transcript

@JoeWalsh
I use a REST client tool to keep sending the request multiple times to the https://zh.wikipedia.org/api/rest_v1/page/mobile-html/2019冠状病毒病疫情 with Accept-Language: zh-hant and Cache-Control: no-cache, the issue can be reproduced.

You can notice the article title changed one time:
嚴重特殊傳染性肺炎疫情 -> 2019冠状病毒病疫情

1585876562661.gif (564×800 px, 615 KB)

It looks like the page loses its Accept-Language and shows the default content, which is a mixed-variants content of the article.

@cooltey good catch, I can also reproduce that behavior locally with curl. I'll update the description and tag this task into Core Platform Team Clinic Duty for investigation.

JoeWalsh renamed this task from [BUG] Some article's variant will be changed after refreshing to Incorrect language variant returned for mobile-html.Apr 3 2020, 10:10 AM
JoeWalsh triaged this task as High priority.
JoeWalsh updated the task description. (Show Details)
JoeWalsh added a project: Platform Engineering.
JoeWalsh renamed this task from Incorrect language variant returned for mobile-html to Incorrect language variant returned for PCS endpoints.Apr 3 2020, 10:19 AM
JoeWalsh updated the task description. (Show Details)
Pchelolo subscribed.

RESTBase stack only supports requests with titles encoded with percent-encoding. Supplying unencoded titles in most cases don't break things, but in this particular case it makes nodejs go nuts - RESTBase receives some crazy gibberish instead of a title, and everything goes sideways from there.

This is a bit surprising to me that Android app is not encoding the titles - if that is the case, none of the pages with / in them will work in the app. Please check if that's the case.

Nothing we can really do here from RESTBase standpoint except updating the documentation which I am doing here: https://github.com/wikimedia/restbase/pull/1254

@Pchelolo thanks for investigating and thanks for the update @cooltey can you verify that the android app isn't percent-encoding the titles and if you percent-encode the title the issue is resolved? I'm unable to repro locally with the title percent-encoded using:

curl -sv -H "Accept-Language: zh-hant" https://zh.wikipedia.org/api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85 | egrep -ohi '(?:<h1 data-id="0" class="pcs-edit-section-title">)[^<]+'

Thanks @Pchelolo and @JoeWalsh.

In the Android app, it does percent-encoding the titles and the following screenshot is the Logcat from the Android Studio.

1586213001765.jpg (322×1 px, 586 KB)

and here is the code:
https://github.com/wikimedia/apps-android-wikipedia/blob/c6939be39bbdea26ca9dba629429369d9e5c2fd1/app/src/main/java/org/wikipedia/bridge/CommunicationBridge.java#L76-L78

@Pchelolo It turns out I am able to repro locally with the title percent encoded when passing "Cache-Control: no-cache". I've updated the task description with the repro steps and example headers for correct and incorrect results. Here is the curl command that shows the problem locally:

curl -sv -H "Accept-Language: zh-hant" -H "Cache-Control: no-cache" https://zh.wikipedia.org/api/rest_v1/page/mobile-html/2019%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85 | egrep -ohi '(?:<h1 data-id="0" class="pcs-edit-section-title">)[^<]+'

Why are you passing Cache-Control: no-cache exactly? These should are not forwarded to RESTBase but stripped on Varnish/ATS level. Theoretically, it shouldn't do anything, but maybe it messes up how CDN cache works.

I'll try to see why are no-cache headers make any difference while they should be ignored, but my immediate suggestion would be to stop adding this header.

It looks like we are indeed asking for no-cache when the user explicitly refreshes the article in the app. Not sure how it got there, but it probably dates back to when we were seeing change-propagation issues (edits not being visible immediately) and were trying (mistakenly?) to see if adding no-cache would help. We can probably remove it safely, but note that this has been part of the app for a long time, with other endpoints such as summary and page-lead.

...or rather, when the user explicitly refreshes the article, we want to force our network stack to read from the network instead of from local cache, and setting no-cache is the recommended guidance to do that.

Tagging Traffic - seems like Varnish/ATS layer no longer unsets 'cache-control: no-cache' header on requests to RESTBase. It used to be the case before. RESTBase stack heavily depends on the caching layer not passing through the cache-control header. It's possible for us to unset those in RESTBase depending on the client IP, but that doesn't seem like a preferred solution. Why has Varnish/ATS stopped unsetting the cache-control header for incoming requests?

Tagging Traffic - seems like Varnish/ATS layer no longer unsets 'cache-control: no-cache' header on requests to RESTBase. It used to be the case before. RESTBase stack heavily depends on the caching layer not passing through the cache-control header. It's possible for us to unset those in RESTBase depending on the client IP, but that doesn't seem like a preferred solution. Why has Varnish/ATS stopped unsetting the cache-control header for incoming requests?

There's no mention of Cache-Control as a request header in our VCL, and the last somehow related change I see was made in 2016 (and it's about parsoid). Unless I'm missing something I'd be thus inclined to say that we haven't been unsetting Cache-Control from requests at the Varnish/ATS layer for a while, possibly ever. We can do that though if needed.

You've just challenged my core belief - I was 100% sure Varnish layer was not passing through the 'cache-control' request header for us, and a lot of restbase updates design was based on that "fact". Do we have some whitelist of request headers that are allowed through? Was there maybe some other software that used to chop off the header?

In any case, there are 2 options to mitigate that:

  1. We can unset client cache-control in varnish layer
  2. We can unset cache-control for external IPs in RESTBase.

I am inclined to say, that maybe unsetting it in RESTBase actually makes more sense - after all RESTBase depends on it and needs to do it's own cleanup. Depending on the specifics of the frontend caching behavior is too fragile, as we are witnessing here. I will do that.

I am inclined to say, that maybe unsetting it in RESTBase actually makes more sense - after all RESTBase depends on it

It sounds reasonable to me that this assumption be enforced by the entity making it, in this case RESTBase.

RESTBase stack only supports requests with titles encoded with percent-encoding. Supplying unencoded titles in most cases don't break things, but in this particular case it makes nodejs go nuts - RESTBase receives some crazy gibberish instead of a title, and everything goes sideways from there.

This is a bit surprising to me that Android app is not encoding the titles - if that is the case, none of the pages with / in them will work in the app. Please check if that's the case.

Nothing we can really do here from RESTBase standpoint except updating the documentation which I am doing here: https://github.com/wikimedia/restbase/pull/1254

@Pchelolo thanks for updating this documentation - are there specifics about which characters are allowed and which characters need to be encoded?

@Pchelolo thanks for updating this documentation - are there specifics about which characters are allowed and which characters need to be encoded?

In JS clients using encodeURIComponent on the title component should work. In other languages - use any built-in percent-encoding methods. Ideally, need to be compliant with https://tools.ietf.org/html/rfc3986

JoeWalsh claimed this task.

@holger.knust this bug is separate from T256491. It looks like the root cause had to do with assumptions about the cache-control header as per this comment. Marking this as resolved as I can no longer repro with the steps in the description.