Page MenuHomePhabricator

Initial transcluded <p> elements on dewiki are fooling extractLeadIntroduction
Closed, ResolvedPublic

Description

Initial transcluded <p> elements on some pages are fooling extractLeadIntroduction into outputting gibberish lead intros. This issue was initially discovered on dewiki but might affect others.

Example: http://localhost:6927/de.wikipedia.org/v1/page/formatted-lead/Berliner_Mauer

intro: "<p><span style=\"display:none\"><span class=\"new\">p2</span></span></p><span>\n</span>"

This of course affects summary output for the affected pages as well, since the summary is derived from the lead intro.

The 02 Feb 2018 page summary switchover was delayed at the last minute due to the eleventh-hour discovery of this issue.

There's currently a workaround strictly for page summaries[1] on master but this needs to be fixed in extractLeadIntroduction.

[1] https://gerrit.wikimedia.org/r/#/c/409176/

Event Timeline

Mholloway created this task.Feb 9 2018, 2:42 AM
Mholloway triaged this task as High priority.

Proposed fix: https://gerrit.wikimedia.org/r/#/c/409197/

(I guess Gerritbot's out to lunch.)

Mholloway claimed this task.Feb 9 2018, 2:46 AM

Change 409197 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Skip transcluded leading non-content paragraphs when extracting lead intro

https://gerrit.wikimedia.org/r/409197

Mholloway updated the task description. (Show Details)Feb 9 2018, 3:03 AM
Mholloway updated the task description. (Show Details)

Change 409197 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Skip transcluded leading non-content paragraphs when extracting lead intro

https://gerrit.wikimedia.org/r/409197

Moving this back to Doing since we are still refining our selection heuristic. Thanks to Bernd's script-based testing, we've found that the current code does not perform well at all on itwiki.

Change 410073 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Use <b> elements to identify probable good first paragraphs

https://gerrit.wikimedia.org/r/410073

Change 410073 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Use <b> elements to identify probable good first paragraphs

https://gerrit.wikimedia.org/r/410073

Change 410216 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Add tests for lead paragraph identification updates

https://gerrit.wikimedia.org/r/410216

Change 410216 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add tests for lead paragraph identification updates

https://gerrit.wikimedia.org/r/410216

Mholloway closed this task as Resolved.Feb 15 2018, 1:17 AM