Page MenuHomePhabricator

Wikipedia API returning an empty string for text extract on some articles
Closed, ResolvedPublic

Description

In the last week or so I've noticed that the Wikipedia API has started returning an empty string when querying for text extracts from certain articles, such as this query about the Cosmic Anisotropy Telescope: https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&titles=Cosmic%20Anisotropy%20Telescope&exsentences=10&explaintext=1&exlimit=1

Casting around for a few others - 1, 2, 3, 4 - I can't see any obvious pattern in why these particular ones would return blank. (The same query works fine on most articles, eg. this.)

Is this a bug, or has there been an update to the API that's caused the query I normally use to behave differently? I've been using this kind of query for months and it's only recently started returning blank strings.

Event Timeline

Try purging the pages with the problem. It may be due to a recent change to how parser output is handled.

Thanks, that seems to be it. I tried a purge on the first two listed above and it fixed the problem.

Will all the (presumably thousands) of other affected articles purge themselves automatically, given time?

The parser cache entries will expire over the next month. The change in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/458732 would fix the interpretation of those old entries before they expire.

How can I purge a page, please?

Jdlrobson subscribed.

How can I purge a page, please?

Apply action=purge to the url
e.g. https://en.wikipedia.org/wiki/London?action=purge

Anomie assigned this task to Tgr.

This should be fixed now with the backport of rMW392446738756: Unwrap HTML loaded from parser cache.