Page MenuHomePhabricator

/page/html endpoint broken when requesting language variants affecting /page/summary
Closed, ResolvedPublic

Description

When requesting the page/summary endpoint for Chinese variant articles, it returns 400:

curl -i -H "Accept-Language: zh-hant" https://zh.wikipedia.org/api/rest_v1/page/summary/%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC

When Accept-Language is not a variant, it works:

curl -i -H "Accept-Language: zh" https://zh.wikipedia.org/api/rest_v1/page/summary/%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC

Debugging

Beginning almost immediately after the wmf.20 rollout we begin seeing many error responses to mobileapps from zhwiki /page/html with the message HTTPError: LanguageConversion is not enabled on this article..

https://logstash.wikimedia.org/goto/72e70761acc959574da0c9fd1e19e905

It seems that this patch fixes the issue, but didn't make the train https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/646794

Event Timeline

Dbrant triaged this task as Unbreak Now! priority.Dec 8 2020, 3:18 PM
Dbrant updated the task description. (Show Details)
Mholloway added a subscriber: Pchelolo.

This error is coming from the upstream /page/html endpoint.

mholloway@mholloway:~/code/wikimedia/mediawiki-services/mobileapps$ curl -i -H "Accept-Language: zh-hant" https://zh.wikipedia.org/api/rest_v1/page/html/%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC
HTTP/2 400 
content-type: application/problem+json
date: Tue, 08 Dec 2020 16:12:44 GMT
server: restbase1023
x-content-type-options: nosniff
cache-control: no-cache
p3p: CP="See https://zh.wikipedia.org/wiki/Special:CentralAutoLogin/P3P for more info."
access-control-allow-origin: *
vary: Accept-Encoding
x-request-id: 0008f107-4be7-48e0-b2bc-9e1df400d3bf
access-control-allow-methods: GET,HEAD
access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-expose-headers: etag
x-frame-options: SAMEORIGIN
referrer-policy: origin-when-cross-origin
x-xss-protection: 1; mode=block
content-security-policy: default-src 'none'; frame-ancestors 'none'
x-content-security-policy: default-src 'none'; frame-ancestors 'none'
x-webkit-csp: default-src 'none'; frame-ancestors 'none'
content-location: https://zh.wikipedia.org/api/rest_v1/page/html/%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC
content-length: 185
age: 2
x-cache: cp1083 miss, cp1077 pass
x-cache-status: pass
server-timing: cache;desc="pass"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Last-Access=08-Dec-2020;Path=/;HttpOnly;secure;Expires=Sat, 09 Jan 2021 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=08-Dec-2020;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Sat, 09 Jan 2021 12:00:00 GMT
x-client-ip: 98.115.47.4
set-cookie: GeoIP=US:PA:Philadelphia:40.07:-75.21:v4; Path=/; secure; Domain=.wikipedia.org

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error","method":"get","uri":"/zh.wikipedia.org/v1/page/html/%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC"}

It looks like the header is being ignored in this case. Compare, e.g., with curl -H "Accept-Language: sr_el" https://sr.wikipedia.org/api/rest_v1/page/html/Сузи_Кватро, which still responds with content in Cyrillic characters, not Latin as requested.

I keep getting the error response from Postman and the curl command it generates:

curl --location --request GET 'https://zh.wikipedia.org/api/rest_v1/page/html/%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC' --header 'Accept-Language: zh-hant'

MSantos renamed this task from Page summary endpoint broken when requesting language variants to /page/html endpoint broken when requesting language variants affecting /page/summary.Dec 8 2020, 5:56 PM

It looks like the error is only triggered when the Accept-Language value is a valid language variant code; zh-hant but not zh_hant, for example. (The latter is ignored, as if we'd passed foo).

According to @schoenbaechler this began a little over 24 hours ago. Per https://wikitech.wikimedia.org/wiki/Server_Admin_Log that corresponds with 1.36.0-wmf.20 being rolled out to all wikis.

For iOS, this is manifesting in the app mostly in the previews we show. So steps to repro:

  1. Add Chinese as wiki language
  2. Have any variant of Chinese as one of you system languages
  3. force press or long press an article in the feed or a blue link

Expected:
Content shows

Actual:
Preview is blank

Obvs this doesn't help with diagnostics, but for those following along, this is where this is breaking part of the user experience.

Beginning almost immediately after the wmf.20 rollout we begin seeing many error responses to mobileapps from zhwiki /page/html with the message HTTPError: LanguageConversion is not enabled on this article..

https://logstash.wikimedia.org/goto/72e70761acc959574da0c9fd1e19e905

I haven't nailed down a specific cause yet, but I suspect that a bug was introduced with this change: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/643394

MSantos added a subscriber: SubrahamanyamVarma.

Beginning almost immediately after the wmf.20 rollout we begin seeing many error responses to mobileapps from zhwiki /page/html with the message HTTPError: LanguageConversion is not enabled on this article..

https://logstash.wikimedia.org/goto/72e70761acc959574da0c9fd1e19e905

I haven't nailed down a specific cause yet, but I suspect that a bug was introduced with this change: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/643394

It seems that this patch fixes the issue, but didn't make the train https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/646794

Change 647046 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@wmf/1.36.0-wmf.20] Bump wikimedia/parsoid to v0.13.0-a19

https://gerrit.wikimedia.org/r/647046

Ok, adding a patch to tonight's backport window which should resolve the issue (by early-deploying Parsoid -a19).

So we'll deploy -a19 to group0 with the usual train at 2000 UTC, and verify that Parsoid -a19 at least doesn't crash and burn and break -group0 before we then backport -a19 early to group1 and group2 in the backport window 2 hrs later at 0000 UTC. Does that timing work? If not, we can do the backport immediately after the train deploy, but we would like to see -a19 live on group0 at least for smoke testing before we go ahead and push it to all prod machines.

So we'll deploy -a19 to group0 with the usual train at 2000 UTC, and verify that Parsoid -a19 at least doesn't crash and burn and break -group0 before we then backport -a19 early to group1 and group2 in the backport window 2 hrs later at 0000 UTC. Does that timing work? If not, we can do the backport immediately after the train deploy, but we would like to see -a19 live on group0 at least for smoke testing before we go ahead and push it to all prod machines.

I'm not opposed to it, but I believe @Dbrant or @JMinor will have a better answer.

2 hours of delay to reduce risk on the back port is ok from iOS side.

Yep, that should be OK for us, too.

Change 647046 merged by jenkins-bot:
[mediawiki/vendor@wmf/1.36.0-wmf.20] Bump wikimedia/parsoid to v0.13.0-a19

https://gerrit.wikimedia.org/r/647046

Mentioned in SAL (#wikimedia-operations) [2020-12-09T00:53:18Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.20/vendor/: rMWVD3278ffd10788: Bump wikimedia/parsoid to v0.13.0-a19 (T269685) (duration: 01m 16s)

The fixes are now deployed and the urls in the description now return a http 200 for me. Can someone from the apps / PI team verify as well?

ssastry claimed this task.

The fixes are now deployed and the urls in the description now return a http 200 for me. Can someone from the apps / PI team verify as well?

Looks good to me.