Page MenuHomePhabricator

Unexpected extracts chinese API response
Closed, DuplicatePublic

Description

Hello folks at Wikimedia,
I'm sorry to report that your extracts API seems to have an unexpected response on a particular page.

Keep up the great work 💪
Enrico

Unexpected response:

{"error":{"code":"nosuchsection","info":"Sections are not supported by Wikipedia:知识问答.","docref":"See https://zh.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."},"servedby":"mw1313"}

Expected response for missing page:

{"batchcomplete":true,"query":{"pages":[{"pageid":0,"missing":true}]}}

Expected response for existing page:

{"batchcomplete":true,"query":{"pages":[{"pageid":13,"ns":0,"title":"数学","extract":"数学是利用符号语言研究數量、结构、变化以及空间等概念的一門学科,从某种角度看屬於形式科學的一種。數學透過抽象化和邏輯推理的使用,由計數、計算、量度和對物體形狀及運動的觀察而產生。數學家們拓展這些概念,為了公式化新的猜想以及從選定的公理及定義中建立起嚴謹推導出的定理。基礎數學的知識與運用總是個人與團體生活中不可或缺的一環。對數學基本概念的完善,早在古埃及、美索不達米亞及古印度內的古代數學文本便可觀見,而在古希臘那裡有更為嚴謹的處理。從那時開始,數學的發展便持續不斷地小幅進展,至16世紀的文藝復興時期,因为新的科學發現和數學革新兩者的交互,致使數學的加速发展,直至今日。数学并成为許多國家及地區的教育範疇中的一部分。\n今日,數學使用在不同的領域中,包括科學、工程、醫學、經濟學和金融學等。數學對這些領域的應用通常被稱為應用數學,有時亦會激起新的數學發現,並導致全新學科的發展,例如物理学的实质性发展中建立的某些理论激发数学家对于某些问题的不同角度的思考。數學家也研究純數學,就是數學本身的实质性內容,而不以任何實際應用為目標。雖然許多研究以純數學開始,但其过程中也發現許多應用之处。…"}]}}

Event Timeline

@Ebonetti90 Thanks for the report, do you get:

{"error":{"code":"nosuchsection","info":"Sections are not supported by Wikipedia:知识问答.","docref":"See https://zh.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."},"servedby":"mw1313"}

For the query:

https://zh.wikipedia.org/w/api.php?action=query&prop=extracts&exlimit=20&exintro=&explaintext=&exchars=512&format=json&formatversion=2&pageids=4488064

This is what I get:

curl 'https://zh.wikipedia.org/w/api.php?action=query&prop=extracts&exlimit=20&exintro=&explaintext=&exchars=512&format=json&formatversion=2&pageids=4488064'
{"batchcomplete":true,"query":{"pages":[{"pageid":4488064,"ns":0,"title":"Wikipedia:知识问答","extract":"…"}]}}

?

Could you share your request headers, that may be the difference to reproduce the problem (curl -v)? Thank you!

Odd, I get the nosuchsection error while logged in to my usual account (which has no special privileges on zhwiki, as far as I know) but not when accessing the URL while logged out. The page's content model is reported as wikitext, so it should support sections. I don't have time to debug further now, but I will look at it later on.

It was some weeks that I was aware of this issue: whatever magic you did, now it works for me too, thanks!

I'll let you know if ever anything arises again for the public API use 👍

Enrico

I do reproduce, however, T215028

curl 'https://el.wikipedia.org/w/api.php?action=query&prop=extracts&exlimit=20&exintro=&explaintext=&exchars=512&format=json&formatversion=2&pageids=298785'
{"error":{"code":"missingtitle","info":"The page you specified doesn't exist.","docref":"See https://el.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."},"servedby":"mw1277"}

That page is weird, because I can see its revision:

https://el.wikipedia.org/w/index.php?oldid=3925388

But I cannot see the page?

https://el.wikipedia.org/wiki/%CE%92%CE%A0:WDATA
https://el.wikipedia.org/w/index.php?title=%CE%92%CE%A0:WDATA&redirect=no

Yep yep, this is a different kind of error ;)

Maybe the global page prefix has been changed from "ΒΠ:" to "Βικιπαίδεια:", leaving trace of the old page in the latest dump and in the page API, so that when extracts API is called, it finds an inconsistent situation.

Enrico

https://el.wikipedia.org/w/api.php?action=query&revids=3925388 provides the answer. That revision is associated with the page "ΒΠ:WDATA" in namespace 0, but since "ΒΠ" is an alias for the "Βικιπαίδεια" namespace there the title parses as page "WDATA" in namespace 4, making revision 3925388's page not accessible by title. More details on T215028#4920713.

Anomie added a project: TextExtracts.

It turns out that page ID 4488064 on zhwiki is in the same situation: that page's title is "Wikipedia:知识问答" in namespace 0, but that title actually corresponds to "知识问答" in namespace 4 (page ID 5867468), which is a Flow board, which indeed doesn't support sections. Apparently TextExtracts is trying to fetch the page by title-string even though it was requested by page ID.

As with T215028, namespaceDupes.php would clean up the inaccessible page. But probably someone should also look into TextExtracts's behavior (although I suspect the nominal maintainers' response will be along the lines of "We don't actually support the extension because we moved our specific use case to a microservice behind restbase. Feel free to submit a patch yourself.").

As these should be solved together