Page MenuHomePhabricator

prop=description does not respect language variants properly
Open, LowPublicBUG REPORT

Description

Context: T352801#9388702

Steps to replicate the issue (include links if applicable):

  • In zh wiki, request the following API to get a list of articles that include their description from wikidata or the content from the articles.

https://zh.wikipedia.org/w/api.php?action=query&exchars=100&exintro=1&explaintext=1&formatversion=2&generator=search&gsrlimit=3&gsrnamespace=0&gsrqiprofile=classic_noboostlinks&gsrsearch=morelike%3A%E8%8F%B2%E5%BE%8B%E8%B3%93&origin=*&pilimit=3&piprop=thumbnail&pithumbsize=160&prop=pageimages|description|extracts|info&format=json&inprop=varianttitles&variant=zh-tw

What happens?:
The description only provides one result instead of results for different language variants.

What should have happened instead?:
The description should either provide the following:

  1. A list of results for different language variants, similar to varianttitles.

or

  1. Output the correct result based on the URL request's Accept-Language header or a possible language variant parameter.

Event Timeline

Extracts seem to work fine to me.

extracts&exchars=100&exintro=1&explaintext=1&variant=zh-hk|https://zh.wikipedia.org/w/api.php?...&variant=zh-hk
extracts&exchars=100&exintro=1&explaintext=1&variant=zh-sg|https://zh.wikipedia.org/w/api.php?...&variant=zh-sg

"description": "行星、恆星、星系、所有物質和能量的總體" // hk; should be "一切空間、時間、物質和能量構成的總體"
"description": "行星、恆星、星系、所有物質和能量的總體" // sg; should be "一切空间、时间、物质和能量构成的总体"
"extract": "宇宙(英語:universe,拉丁語:universus)是所有的時間、空間與其包含的內容物所構成的統一體,宇是指空間,而宙是指時間;它包含了行星、恆星、星系、星系際空間、亞原子粒子以及所有的物質與能量..." // hk
"extract": "宇宙(英语:universe,拉丁语:universus)是所有的时间、空间与其包含的内容物所构成的统一体,宇是指空间,而宙是指时间;它包含了行星、恒星、星系、星系际空间、亚原子粒子以及所有的物质与能量..." //sg

(Although FWIW prop=extract is deprecated because Page Content Service provides better quality.)

cooltey renamed this task from prop=description and prop=extract do not respect language variants properly to prop=description does not respect language variants properly.Dec 11 2023, 11:09 PM
cooltey updated the task description. (Show Details)
cooltey updated the task description. (Show Details)

Extracts seem to work fine to me.

extracts&exchars=100&exintro=1&explaintext=1&variant=zh-hk|https://zh.wikipedia.org/w/api.php?...&variant=zh-hk
extracts&exchars=100&exintro=1&explaintext=1&variant=zh-sg|https://zh.wikipedia.org/w/api.php?...&variant=zh-sg

"description": "行星、恆星、星系、所有物質和能量的總體" // hk; should be "一切空間、時間、物質和能量構成的總體"
"description": "行星、恆星、星系、所有物質和能量的總體" // sg; should be "一切空间、时间、物质和能量构成的总体"
"extract": "宇宙(英語:universe,拉丁語:universus)是所有的時間、空間與其包含的內容物所構成的統一體,宇是指空間,而宙是指時間;它包含了行星、恆星、星系、星系際空間、亞原子粒子以及所有的物質與能量..." // hk
"extract": "宇宙(英语:universe,拉丁语:universus)是所有的时间、空间与其包含的内容物所构成的统一体,宇是指空间,而宙是指时间;它包含了行星、恒星、星系、星系际空间、亚原子粒子以及所有的物质与能量..." //sg

(Although FWIW prop=extract is deprecated because Page Content Service provides better quality.)

Thanks @Tgr! I updated the ticket description for the prop=description.

There are two implementations of prop=description:

  • fetching the description from the ParserOutput, used on enwiki only; in theory just need to make sure we are using the ParserOutput in the right language variant, but enwiki doesn't have variants so probably no point in worrying about this
  • loading the description from Wikidata, used on every other wiki; Wikidata doesn't have a separate concept of variants, so you just have to make sure you use the variant's language code when requesting the Wikidata description (you can check how wgUserVariant is set in OutputPage for an example)
JTannerWMF added subscribers: ABorbaWMF, JTannerWMF.

Is there someone on this team that will QA this @Tgr and @matmarex ? If not we will have @ABorbaWMF do it

Not sure which team that is, Wikibase is owned by WMDE and the past prop=description projects were one-offs. (It seems fairly easy though. If someone wants to try their hand at fixing it, they probably just need to add some variant handling to the language logic here.)