Page MenuHomePhabricator

Page Content Service summary endpoint "extract" value does not vary per Accept-Language, and returns different output from expected LanguageConverter results
Open, HighPublic

Description

See here:
https://zh.wikipedia.org/api/rest_v1/page/summary/%E7%A7%8D%E4%B8%96%E8%A1%A1 (This is automatically converted to Simplified Chinese)

种世衡(985年-1045年),字仲平,工部侍郎种放之侄子,父种昭衍,因为叔父种放恩荫而补任将作监主簿入仕。为北宋一朝种家將的开山人。著名军事家、书画家。为时总领西北军务的范仲淹一手提拔。招抚羌人,康定元年在延州东北200里处筑青涧城,迁内殿崇班,知城事(差遣)。以固延州之势,抚城安羌民,“在边数年,积谷通货,所至不烦县官益兵增馈。善抚养士卒,病者遣一子专视其食饮汤剂,以故得人死力。”。庆历三年,迁洛苑副使,调知环州事,到任后,即巡境抚慰羌民相继来归,同年西夏军进攻渭州(今甘肃平凉),率军出援有功,晋升为东染院使 、环庆路兵马钤辖。庆历五年正月七日(1月27日)卒,赠成州团练使。

https://zh.wikipedia.org/api/rest_v1/page/summary/%E7%A7%8D%E5%B8%88%E4%B8%AD (This is automatically converted to Traditional Chinese)

種師中(1059年-1126年),字端孺,宋代名將種世衡第七子種記的次子,北宋名將種師道之弟,種氏三代為西北名將,人稱小種經略相公。歷任知環州、知濱州、知邠州、知慶陽府、知秦州事,侍衛親軍馬軍副都指揮使 、房州觀察使,奉寧軍承宣使。

Both content are converted using LanguageConverter. But:

  • They should be either both converted to one particular variant, or not converted at all (see the next point)
  • The conversion does not consider any local rule. Compare:

https://zh.wikipedia.org/zh-hk/%E7%A7%8D%E5%B8%88%E4%B8%AD (the correct conversion)

師中(1059年-1126年),字端孺,宋代名將世衡第七子記的次子,北宋名將師道之弟,氏三代為西北名將,人稱小經略相公。歷任知環州、知濱州、知邠州、知慶陽府、知秦州事,侍衛親軍馬軍副都指揮使(禁軍軍職,正五品)、房州觀察使(階官,正任觀察使,正五品),奉寧軍承宣使(階官,正任承宣使,正四品)

https://zh.wikipedia.org/zh-hant/%E7%A7%8D%E5%B8%88%E4%B8%AD (the conversion is incorrect, but also different from the result of Page Content Service)

師中(1059年-1126年),字端孺,宋代名將世衡第七子記的次子,北宋名將師道之弟,氏三代為西北名將,人稱小經略相公。歷任知環州、知濱州、知邠州、知慶陽府、知秦州事,侍衛親軍馬軍副都指揮使(禁軍軍職,正五品)、房州觀察使(階官,正任觀察使,正五品),奉寧軍承宣使(階官,正任承宣使,正四品)

Event Timeline

Restricted Application added subscribers: Cosine02, Aklapper. · View Herald Transcript
Jdforrester-WMF renamed this task from Page Content Service does not work well with LanguageConverter to Page Content Service summary endpoint "extract" value does not vary per Accept-Language, and returns different output from expected LanguageConverter results.Dec 7 2019, 10:24 PM

From quick local testing, curl -s -H 'Accept-Language: zh-Hans' -XGET https://zh.wikipedia.org/api/rest_v1/page/summary/%E7%A7%8D%E4%B8%96%E8%A1%A1, curl -s -H 'Accept-Language: zh-Hans-CN' -XGET https://zh.wikipedia.org/api/rest_v1/page/summary/%E7%A7%8D%E4%B8%96%E8%A1%A1, curl -s -H 'Accept-Language: zh-Hant' -XGET https://zh.wikipedia.org/api/rest_v1/page/summary/%E7%A7%8D%E4%B8%96%E8%A1%A1 and curl -s -H 'Accept-Language: zh-Hant-HK' -XGET https://zh.wikipedia.org/api/rest_v1/page/summary/%E7%A7%8D%E4%B8%96%E8%A1%A1 return different results, but only vary on the displaytitle value, the titles.display value (which… is the same?) and tid (as expected). In particular, lang is wrongly set to unvarianted zh, and the extract and extract_html values appear to be unvarianted zh-Hant or zh-Hans; not sure how which is picked.

Didn't this use to work?

Didn't this use to work?

Yes - I tested an confirmed it worked for T227825

LGoto triaged this task as Medium priority.Dec 11 2019, 4:39 PM
Jhernandez raised the priority of this task from Medium to High.Dec 11 2019, 4:40 PM
MSantos claimed this task.Wed, Jan 8, 3:27 PM
MSantos moved this task from To Do to Doing on the Product-Infrastructure-Team-Backlog (Kanban) board.

We have to different problems going on here:

  1. wrong conversion when requesting variant from /page/html endpoint
  1. The words inside the parenthesis are being stripped out independent of the language conversion, see T226323: [Bug] Page summaries should not strip the normalized title from the extract?

Despite that, PCS is correctly forwarding lang headers.

@Pchelolo and @Clarakosi, I wonder if this issue has anything to do with parsoid changes and if the recent restbase deploy could fix the language conversion problem.

@Pchelolo and @Clarakosi, I wonder if this issue has anything to do with parsoid changes and if the recent restbase deploy could fix the language conversion problem.

That is quite possible. We need to retest this again since we've completed and cleaned up the Parsoid-PHP transition yesterday. I'll have a look.

MSantos removed MSantos as the assignee of this task.Wed, Jan 15, 6:22 PM
MSantos added a subscriber: MSantos.