Page MenuHomePhabricator

[BUG] Section parsing bug on https://en.wikipedia.org/wiki/Chiang Kai-shek
Closed, ResolvedPublicBUG REPORT

Description

First reported by a user on OTRS https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=10725677

Steps to Reproduce:

  1. Open the article for Chiang Kai-shek
  2. Open the table of contents

Actual Results:
None of the article headings are rendering in the ToC

Expected Results:
ToC is shown for each section heading in the article
Note that it is showing fine in the iOS app:

Occurring on
Wikipedia v2.7.234-alpha-2018-06-15

Event Timeline

RHo created this task.Jun 15 2018, 3:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 15 2018, 3:52 PM
RHo updated the task description. (Show Details)Jun 15 2018, 3:57 PM

I've checked the https://en.wikipedia.org/api/rest_v1/page/mobile-sections-remaining/Chiang_Kai-shek and found it only returns one section, which is not a correct result.

Not sure how the iOS app gets the sections list, and maybe the bug is related to the content with some incomplete formats.

@Mholloway @bearND do you any idea?

The sectioning for this endpoint is base on the Parsoid output for the same page: https://en.wikipedia.org/api/rest_v1/page/html/Chiang_Kai-shek. Looks like all the sections in this document have a data-mw-section-id of -1. I'm not sure why that is for this article. The sections are not transcluded. Is that a Parsoid bug?

The sectioning for this endpoint is base on the Parsoid output for the same page: https://en.wikipedia.org/api/rest_v1/page/html/Chiang_Kai-shek. Looks like all the sections in this document have a data-mw-section-id of -1. I'm not sure why that is for this article. The sections are not transcluded. Is that a Parsoid bug?

The wikitext for this page has a {{stack begin}} at the top of the page which leaves an open <div> tag that isn't closed anywhere. So, it triggers the -2 pseudo-section there. But should check why the rest are -1ed. Later today / tomorrow.

The sectioning for this endpoint is base on the Parsoid output for the same page: https://en.wikipedia.org/api/rest_v1/page/html/Chiang_Kai-shek. Looks like all the sections in this document have a data-mw-section-id of -1. I'm not sure why that is for this article. The sections are not transcluded. Is that a Parsoid bug?

The wikitext for this page has a {{stack begin}} at the top of the page which leaves an open <div> tag that isn't closed anywhere. So, it triggers the -2 pseudo-section there. But should check why the rest are -1ed. Later today / tomorrow.

Oh yes, of course ... this is because the entire page is marked as template generated because the unclosed <div> tag comes from a template. Right now, pages like these aren't easily editable in VE either.

I don't have any good solution for this here right now. Parsoid's template-wrapping and section-wrapping analysis would have to get smarter about exposing more top-level content that gets pulled into template-wrapped content because of unbalanced HTML tag scenarios like this, but it is tricky to get right in the general case.

Mholloway added a comment.EditedJun 28 2018, 4:13 PM

As described by @ssastry above, this is a template problem on this specific page. This comes up from time to time (see T182349: Section parsing bug on :en:Wikimedia Foundation for a previous example). It is not caused by Parsoid emitting -1 and -2 IDs for non-editable sections. Sections are being parsed incorrectly here because of the underlying content problem. It can probably be fixed just by editing a template invocation. We'll take a look at the page and get it fixed.

Mholloway renamed this task from [BUG] Certain articles are not rendering Table of Contents correctly in the app to [BUG] Section parsing bug on https://en.wikipedia.org/wiki/Chiang Kai-shek.Jun 28 2018, 4:17 PM
Vvjjkkii renamed this task from [BUG] Section parsing bug on https://en.wikipedia.org/wiki/Chiang Kai-shek to staaaaaaaa.Jul 1 2018, 1:03 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Mholloway as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Sharvaniharan renamed this task from staaaaaaaa to [BUG] Section parsing bug on https://en.wikipedia.org/wiki/Chiang Kai-shek .Jul 1 2018, 6:00 AM
Sharvaniharan closed this task as Resolved.
Sharvaniharan assigned this task to Mholloway.
Sharvaniharan lowered the priority of this task from High to Medium.
Sharvaniharan updated the task description. (Show Details)
Sharvaniharan added a subscriber: Aklapper.