mobile-sections: Cannot read property 'indexOf' of undefined in markReferenceSections
Open, HighPublic

Description

From log file in production.

500: internal_error (message="500: internal_error", status=500, type=internal_error, detail="Cannot read property 'indexOf' of undefined", levelPath=error/500, request_id=a3d71ed3-9406-11e7-8e78-11414b1d2af5)
    stack: TypeError: Cannot read property 'indexOf' of undefined
        at sections.forEach (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/lib/transformations/markReferenceSections.js:34:17)
        at Array.forEach (native)
        at Object.markReferenceSections (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/lib/transformations/markReferenceSections.js:26:14)
        at buildRemaining (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/routes/mobile-sections.js:161:20)
        at buildAll (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/routes/mobile-sections.js:194:20)
        at _collectRawPageData.then (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/routes/mobile-sections.js:323:20)
        at tryCatcher (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/util.js:16:23)
        at Promise._settlePromiseFromHandler (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:512:31)
        at Promise._settlePromise (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:569:18)
        at Promise._settlePromise0 (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:614:10)
        at Promise._settlePromises (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:693:18)
        at Promise._fulfill (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:638:18)
        at Promise._resolveCallback (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:432:57)
        at Promise._settlePromiseFromHandler (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:524:17)
        at Promise._settlePromise (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:569:18)
        at Promise._settlePromise0 (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:614:10)
        at Promise._settlePromises (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:693:18)
        at Promise._fulfill (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:638:18)
        at PropertiesPromiseArray.PromiseArray._resolve (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise_array.js:126:19)
    --
    request: {
      "url": "/fr.wikipedia.org/v1/page/mobile-sections/Portail%3ABaroque",
...
      "method": "GET",
      "params": {
        "0": "/fr.wikipedia.org/v1/page/mobile-sections/Portail:Baroque"
      },
...
    }

So, to repro run this on one of the production boxes:

curl http://localhost:8888/fr.wikipedia.org/v1/page/mobile-sections/Portail:Baroque
{"status":500,"type":"internal_error","title":"TypeError","detail":"Cannot read property 'indexOf' of undefined","method":"GET","uri":"/fr.wikipedia.org/v1/page/mobile-sections/Portail:Baroque"}

Another example: http://localhost:8888/fy.wikipedia.org/v1/page/mobile-sections/De_Kanto%27s

To repro locally:
http://localhost:6927/fy.wikipedia.org/v1/page/mobile-sections/De_Kanto%27s

More cases are listed in P5992.

bearND created this task.Sep 7 2017, 8:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 7 2017, 8:11 PM
bearND triaged this task as High priority.Sep 7 2017, 8:11 PM
bearND moved this task from To Do to Doing on the Reading-Infrastructure-Team-Backlog (Kanban) board.
bearND updated the task description. (Show Details)Sep 7 2017, 8:19 PM
bearND added a comment.Sep 7 2017, 8:25 PM

The issue is that for some section text is not set. After avoiding this issue there is a follow-up needed to invetigate why there is an extra section at the end with just and id but no text or line.

http://localhost:6927/fy.wikipedia.org/v1/page/mobile-sections/De_Kanto%27s

[...]
},
{
  "id": 2,
  "text": "\n<p>Yn <a href=\"/wiki/1972\" title=\"1972\">1972</a> fertaalde <a href=\"/wiki/Oersetter\" title=\"Oersetter\" class=\"mw-redirect\">oersetter</a> <a href=\"/wiki/Haring_Tjittes_Piebenga\" title=\"Haring Tjittes Piebenga\">Haring Tjittes Piebenga</a> ien gedicht út <i>The Cantos</i> yn it <a href=\"/wiki/Frysk\" title=\"Frysk\">Frysk</a>, ûnder de titel <i>Canto XLIX fan Ezra Pound</i>. Dy fertaling waard dat jiers publisearre yn it tydskrift <i><a href=\"/wiki/Alternatyf_(tydskrift)\" title=\"Alternatyf (tydskrift)\">Alternatyf</a></i>.</p>\n\n",
  "toclevel": 1,
  "line": "Fryske oersetting",
  "anchor": "Fryske_oersetting"
},
{
  "id": 3
}

Ok, I think the real issue is with sectioning the code. The sectioning code behaves inconsistently. If the current section is at the level of an <h2> and the text contains an <h2> inside a <div> it starts a new section. OTOH if the text contains an <h3> inside a <div> instead it continues the current section.

<div><h2> example

<body>
text0
<h2 id="foo">foo</h2>text1
<div><h2 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h2></div>
<p>text 2</p>
</body>
-->
<body>
<section>text0</section>
<section><h2 id="foo">foo</h2>text1</section>
<section>
	<div><h2 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h2></div><p>text 2</p>
</section>
</body>

<div><h3> example

<body>
text0
<h2 id="foo">foo</h2>text1
<div><h3 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h3></div>
<p>text 2</p>
</body>
--> 
<body>
<section>text0</section>
<section><h2 id="foo">foo</h2>text1
	<div><h3 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h3></div><p>text 2</p>
</section>
</body>

Note that in the second example there is no new <section> directly wrapping the <div>.

I'm a bit torn what the correct behavior should be. Not creating a new section would be behaving the same as action=mobileview (= I assume PHP parser). Creating the new section in this case is probably what the editor intended in this case since it is kind of a references section, even if not a typical one.
CC @ssastry since I'm curious how the Parsoid version of this would behave and as a heads-up.

GWicke added a comment.EditedSep 7 2017, 10:04 PM

Starting a new section when encountering a new heading of the same level is expected behavior, in line with MediaWiki section edit behavior, as well as HTML5 semantics. When encountering a heading of a higher level (higher number, lower prominence), the sectioning code I wrote in parsoid-utils creates a nested section. This is in line with typical HTML5 section and page outline semantics: https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_HTML_sections_and_outlines.

bearND added a comment.Sep 8 2017, 5:59 PM

@GWicke:
When encountering a heading of a higher level (higher number, lower prominence), the sectioning code I wrote in parsoid-utils creates a nested section.

Backing up a bit... When an <h2> comes after a <h2> to create a new section is expected. Just not when the second <h2> is wrapped in an extra <div>. See P5978, where I tried four different cases:

<h2 id=foo> then <h2 id=bar>New <section> for bar, as expected
<h2 id=foo> then <div><h2 id=bar>New <section> for bar. It's behaving like there was no <div> around the <h2>
<h2 id=foo> then <h3 id=bar>New <section> for bar, as expected
<h2 id=foo> then <div><h3 id=bar>No separate <section> for bar, continued foo section

I expected case 2 and 4 to be similar.

bearND added a comment.EditedSep 8 2017, 6:38 PM

@ssastry asked me to try out his patch, which I did on P5980. I only tried the equivalent of the actual page, which corresponds to case 4 in my previous comment. I created a stripped down version of it.

It creates two sections: 1 lead + the foo section; the references (aka bar section text is contained inside the foo section, which is similar to how action=mobileview behaves.

In terms of document structure, the behavior in line two (add section around <div>-wrapped heading) seems to make sense. I think it also matches edit section behavior, which should ignore the <div> completely (as it is not DOM-based).

which should ignore the <div> completely (as it is not DOM-based).

Sorry, I'm not sure I follow. What do you mean when you say a <div> is not DOM-based? The <div> is an HTML element and by definition part of the DOM.

GWicke added a comment.EditedSep 11 2017, 7:12 PM

@bearND, MediaWiki's section edit feature is implemented without knowledge of a DOM, so <div> wrappers do not suppress edit sections. Example: https://en.wikipedia.org/wiki/User:GWicke/TestSections with source

== First h2. ==
Some content for the first heading.
<div>
== Second h2, embedded in div. ==
Some content for the second heading.
</div>
== Third h2, not embedded in div. ==
Some content for the third heading.

Arguably, this behavior usually matches user's expectations by making each top-level section editable and addressable. I think it is worth matching this in the new section implementation, which is why I went with that option in parsoid-dom-utils.

Change 377349 had a related patch set uploaded (by BearND; owner: BearND):
[mediawiki/services/mobileapps@master] Bring back old section handling code

https://gerrit.wikimedia.org/r/377349

Change 377349 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Bring back old section handling code

https://gerrit.wikimedia.org/r/377349

In the patch ^ I've changed the parsoid-access code to use the old code for sectioning via <div> tags instead of <section> tags. It'll automatically take advantage of <section> tags once Parsoid produces them. This should give us some time to come up with a proper solution if T114072 is taking too long.

bearND updated the task description. (Show Details)Sep 12 2017, 8:33 PM
Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2017-09-12T21:55:45Z] <bsitzmann@tin> Started deploy [mobileapps/deploy@b11b75c]: Update mobileapps to 297b048 (T174707 T175305)

Mentioned in SAL (#wikimedia-operations) [2017-09-12T22:00:13Z] <bsitzmann@tin> Finished deploy [mobileapps/deploy@b11b75c]: Update mobileapps to 297b048 (T174707 T175305) (duration: 04m 28s)

I finally got @ssastry 's Parsoid patch tested with MCS. With that I see similar behavior in MCS in that is also generated an extra empty section at the end with just an id after avoiding the 'indexOf' of undefined exception, like

http://localhost:6927/fr.wikipedia.org/v1/page/mobile-sections/Portail:Togo/140589478

"remaining": {
  "sections": [
    {
      "id": 1
    }
  ]
}
Tgr added a subscriber: Tgr.Sep 26 2017, 9:06 PM

Do we know what the goals of the mobile-section data are? E.g. make sure all article content is section-editable (ie. provide the same set of top-TOC-level sections as the PHP parser)? Make sure the page can be displayed as a set of collapsed sections (return the top-DOM-level section headers)?

Here's a little spreadsheet I made using some of the page titles from P5992 showing the differences for some edge cases in terms of how many sections are produced.
(The action=parse column is expected to be 1 less than the actual number since it doesn't report the lead section.)
Most affected pages showing up in the logs are portal pages (or similarly project pages), which tend to transclude subpages or try very hard to customize the display. Then there are also some talk pages, too.

parsoid-dom-utils: For most pages listed here I see that the parsoid-dom-utils sectioning code returns 2 sections. The second section, after running through the current MCS code, ends up being without text, line or anchor. It has only an id.

Parsoid patch: For lots of pages listed here I see two <section> tags with data-section-number="-1". Some cases show a single section with number 0, which seems to be ok in most cases.

I'm not sure how NOTOC and NOEDITSECTION should influence the output of the sectioning code. I think it would be good to convey this information to the client, though. I see the <section> tags be useful for both providing information for the ToC and for section editing. Maybe we should add additional attributes to signify if a section should be visible in the ToC and/or if a section is not editable or what the page should be edited if it's different from the current one[1]?

[1] see fromtitle in https://fr.wikipedia.org/w/api.php?action=parse&format=json&prop=sections&page=Projet:%C3%89conomie pointing to different subpages.

Here's a little spreadsheet I made using some of the page titles from P5992 showing the differences for some edge cases in terms of how many sections are produced.

I randomly picked one of the titles and when I looked at Parsoid output, I see that the most of the page is wrapped in a small tag => it becomes a single section. So, basically, the page has an unclosed tag which will break rendering even Tidy is replaced. See with https://fr.wikipedia.org/wiki/1815_en_droit?action=parsermigration-edit ... so, if the lint error is fixed on that page, I expect it will render properly. Worth checking on other pages as well.

I'm not sure how NOTOC and NOEDITSECTION should influence the output of the sectioning code.

I think Parsoid should act on NOEDITSECTION but not on NOTOC. The NOTOC can be handled by the layer on top of Parsoid. The information can be part of metadata. Right now, Parsoid generates a meta tag for purposes of round tripping, but I think it should probably be surfaced in the <head>.

Here's a little spreadsheet I made using some of the page titles from P5992 showing the differences for some edge cases in terms of how many sections are produced.

I randomly picked one of the titles and when I looked at Parsoid output, I see that the most of the page is wrapped in a small tag => it becomes a single section. So, basically, the page has an unclosed tag which will break rendering even Tidy is replaced. See with https://fr.wikipedia.org/wiki/1815_en_droit?action=parsermigration-edit ... so, if the lint error is fixed on that page, I expect it will render properly.

https://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Droit_1800-1819&action=edit has an unclosed <small> tag. which then leaks on all pages that transclude it.