Page MenuHomePhabricator

Some articles return empty extracts until touched
Closed, DuplicatePublic

Description

Please do NOT touch the referenced article before investigating the issue.


Reproduction steps: Get https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info|extracts|pageimages|revisions&formatversion=2&redirects=true&exintro=true&exsentences=5&explaintext=true&piprop=thumbnail&pithumbsize=300&rvprop=timestamp&titles=Hydrogen_damage&smaxage=300&maxage=300&uselang=content
Expected result: Parameter extract contains a meaningful plaintext extract.
Actual result: Parameter extract is an empty string:

{"batchcomplete":true,"query":{"normalized":[{"from":"Hydrogen_damage","to":"Hydrogen damage"}],"pages":[{"pageid":3605505,"ns":0,"title":"Hydrogen damage","contentmodel":"wikitext","pagelanguage":"en","pagelanguagehtmlcode":"en","pagelanguagedir":"ltr","touched":"2016-02-12T01:36:19Z","lastrevid":695808952,"length":8508,"extract":"","revisions":[{"timestamp":"2015-12-18T19:21:17Z"}]}]}}

Reproduction steps: Get https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info|extracts|pageimages|revisions&formatversion=2&redirects=true&exintro=true&exsentences=5&piprop=thumbnail&pithumbsize=300&rvprop=timestamp&titles=Hydrogen_damage&smaxage=300&maxage=300&uselang=content (it is the previous URL with the explaintext parameter removed)
Expected & actual result: Parameter extract contains a meaningful HTML extract:

{"batchcomplete":true,"query":{"normalized":[{"from":"Hydrogen_damage","to":"Hydrogen damage"}],"pages":[{"pageid":3605505,"ns":0,"title":"Hydrogen damage","contentmodel":"wikitext","pagelanguage":"en","pagelanguagehtmlcode":"en","pagelanguagedir":"ltr","touched":"2016-02-12T01:36:19Z","lastrevid":695808952,"length":8508,"extract":"<p><b>Hydrogen damage</b> is the generic name given to a large number of metal degradation processes due to interaction with hydrogen.</p>","revisions":[{"timestamp":"2015-12-18T19:21:17Z"}]}]}}

Reproduction steps: Get https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info|extracts|pageimages|revisions&formatversion=2&redirects=true&exsentences=5&explaintext=true&piprop=thumbnail&pithumbsize=300&rvprop=timestamp&titles=Hydrogen_damage&smaxage=300&maxage=300&uselang=content (it is the first URL with the exintro parameter removed)
Expected result: Parameter extract contains a meaningful plaintext extract.
Actual result: Parameter extract equals \n== ClassificationsEdit == (observation: "Classifications" is the first section heading of the article):

{"batchcomplete":true,"query":{"normalized":[{"from":"Hydrogen_damage","to":"Hydrogen damage"}],"pages":[{"pageid":3605505,"ns":0,"title":"Hydrogen damage","contentmodel":"wikitext","pagelanguage":"en","pagelanguagehtmlcode":"en","pagelanguagedir":"ltr","touched":"2016-02-12T01:36:19Z","lastrevid":695808952,"length":8508,"extract":"\n== ClassificationsEdit ==","revisions":[{"timestamp":"2015-12-18T19:21:17Z"}]}]}}

I have noticed this before in another article and a null edit (do NOT do that before investigation) fixed that. It seems that some secondary information (i.e. data generated automatically from primary information) is broken, but I would expect all secondary information to be regenerated periodically.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 12 2016, 9:49 AM
petr.matas triaged this task as Unbreak Now! priority.Mar 12 2016, 9:52 AM

Setting the highest priority to hasten the investigation, which is possible probably only until the article is touched.

petr.matas updated the task description. (Show Details)Mar 12 2016, 9:55 AM
petr.matas updated the task description. (Show Details)Mar 12 2016, 11:37 PM
petr.matas updated the task description. (Show Details)Mar 12 2016, 11:56 PM
petr.matas updated the task description. (Show Details)Mar 13 2016, 10:33 AM
Jdlrobson added a subscriber: Jdlrobson.

Since table of contents is visible in mobile view of this page for anons and in desktop view for anons the mobile edit link is in the desktop view this is related to the ParserCache corruption and not an issue in TextExtracts (hence why touching article will fix the issue).

Note fixing T125841 will make this issue go away