mobile-sections: Cannot read property 'indexOf' of undefined in markReferenceSections
Open, HighPublic

Description

From log file in production.

500: internal_error (message="500: internal_error", status=500, type=internal_error, detail="Cannot read property 'indexOf' of undefined", levelPath=error/500, request_id=a3d71ed3-9406-11e7-8e78-11414b1d2af5)
    stack: TypeError: Cannot read property 'indexOf' of undefined
        at sections.forEach (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/lib/transformations/markReferenceSections.js:34:17)
        at Array.forEach (native)
        at Object.markReferenceSections (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/lib/transformations/markReferenceSections.js:26:14)
        at buildRemaining (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/routes/mobile-sections.js:161:20)
        at buildAll (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/routes/mobile-sections.js:194:20)
        at _collectRawPageData.then (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/src/routes/mobile-sections.js:323:20)
        at tryCatcher (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/util.js:16:23)
        at Promise._settlePromiseFromHandler (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:512:31)
        at Promise._settlePromise (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:569:18)
        at Promise._settlePromise0 (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:614:10)
        at Promise._settlePromises (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:693:18)
        at Promise._fulfill (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:638:18)
        at Promise._resolveCallback (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:432:57)
        at Promise._settlePromiseFromHandler (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:524:17)
        at Promise._settlePromise (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:569:18)
        at Promise._settlePromise0 (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:614:10)
        at Promise._settlePromises (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:693:18)
        at Promise._fulfill (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise.js:638:18)
        at PropertiesPromiseArray.PromiseArray._resolve (/srv/deployment/mobileapps/deploy-cache/revs/507a479c96953ff0b3fa5ba12472b0c56570c138/node_modules/bluebird/js/release/promise_array.js:126:19)
    --
    request: {
      "url": "/fr.wikipedia.org/v1/page/mobile-sections/Portail%3ABaroque",
...
      "method": "GET",
      "params": {
        "0": "/fr.wikipedia.org/v1/page/mobile-sections/Portail:Baroque"
      },
...
    }

So, to repro run this on one of the production boxes:

curl http://localhost:8888/fr.wikipedia.org/v1/page/mobile-sections/Portail:Baroque
{"status":500,"type":"internal_error","title":"TypeError","detail":"Cannot read property 'indexOf' of undefined","method":"GET","uri":"/fr.wikipedia.org/v1/page/mobile-sections/Portail:Baroque"}

Another example: http://localhost:8888/fy.wikipedia.org/v1/page/mobile-sections/De_Kanto%27s

To repro locally:
http://localhost:6927/fy.wikipedia.org/v1/page/mobile-sections/De_Kanto%27s

More cases are listed in P5992.

bearND created this task.Thu, Sep 7, 8:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptThu, Sep 7, 8:11 PM
bearND triaged this task as High priority.Thu, Sep 7, 8:11 PM
bearND moved this task from To Do to Doing on the Reading-Infrastructure-Team-Backlog (Kanban) board.
bearND updated the task description. (Show Details)Thu, Sep 7, 8:19 PM
bearND added a comment.Thu, Sep 7, 8:25 PM

The issue is that for some section text is not set. After avoiding this issue there is a follow-up needed to invetigate why there is an extra section at the end with just and id but no text or line.

http://localhost:6927/fy.wikipedia.org/v1/page/mobile-sections/De_Kanto%27s

[...]
},
{
  "id": 2,
  "text": "\n<p>Yn <a href=\"/wiki/1972\" title=\"1972\">1972</a> fertaalde <a href=\"/wiki/Oersetter\" title=\"Oersetter\" class=\"mw-redirect\">oersetter</a> <a href=\"/wiki/Haring_Tjittes_Piebenga\" title=\"Haring Tjittes Piebenga\">Haring Tjittes Piebenga</a> ien gedicht út <i>The Cantos</i> yn it <a href=\"/wiki/Frysk\" title=\"Frysk\">Frysk</a>, ûnder de titel <i>Canto XLIX fan Ezra Pound</i>. Dy fertaling waard dat jiers publisearre yn it tydskrift <i><a href=\"/wiki/Alternatyf_(tydskrift)\" title=\"Alternatyf (tydskrift)\">Alternatyf</a></i>.</p>\n\n",
  "toclevel": 1,
  "line": "Fryske oersetting",
  "anchor": "Fryske_oersetting"
},
{
  "id": 3
}

Ok, I think the real issue is with sectioning the code. The sectioning code behaves inconsistently. If the current section is at the level of an <h2> and the text contains an <h2> inside a <div> it starts a new section. OTOH if the text contains an <h3> inside a <div> instead it continues the current section.

<div><h2> example

<body>
text0
<h2 id="foo">foo</h2>text1
<div><h2 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h2></div>
<p>text 2</p>
</body>
-->
<body>
<section>text0</section>
<section><h2 id="foo">foo</h2>text1</section>
<section>
	<div><h2 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h2></div><p>text 2</p>
</section>
</body>

<div><h3> example

<body>
text0
<h2 id="foo">foo</h2>text1
<div><h3 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h3></div>
<p>text 2</p>
</body>
--> 
<body>
<section>text0</section>
<section><h2 id="foo">foo</h2>text1
	<div><h3 id="Boarnen.2C_noaten_en_referinsjes"> Boarnen, noaten en referinsjes </h3></div><p>text 2</p>
</section>
</body>

Note that in the second example there is no new <section> directly wrapping the <div>.

I'm a bit torn what the correct behavior should be. Not creating a new section would be behaving the same as action=mobileview (= I assume PHP parser). Creating the new section in this case is probably what the editor intended in this case since it is kind of a references section, even if not a typical one.
CC @ssastry since I'm curious how the Parsoid version of this would behave and as a heads-up.

GWicke added a comment.EditedThu, Sep 7, 10:04 PM

Starting a new section when encountering a new heading of the same level is expected behavior, in line with MediaWiki section edit behavior, as well as HTML5 semantics. When encountering a heading of a higher level (higher number, lower prominence), the sectioning code I wrote in parsoid-utils creates a nested section. This is in line with typical HTML5 section and page outline semantics: https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_HTML_sections_and_outlines.

bearND added a comment.Fri, Sep 8, 5:59 PM

@GWicke:
When encountering a heading of a higher level (higher number, lower prominence), the sectioning code I wrote in parsoid-utils creates a nested section.

Backing up a bit... When an <h2> comes after a <h2> to create a new section is expected. Just not when the second <h2> is wrapped in an extra <div>. See P5978, where I tried four different cases:

<h2 id=foo> then <h2 id=bar>New <section> for bar, as expected
<h2 id=foo> then <div><h2 id=bar>New <section> for bar. It's behaving like there was no <div> around the <h2>
<h2 id=foo> then <h3 id=bar>New <section> for bar, as expected
<h2 id=foo> then <div><h3 id=bar>No separate <section> for bar, continued foo section

I expected case 2 and 4 to be similar.

bearND added a comment.EditedFri, Sep 8, 6:38 PM

@ssastry asked me to try out his patch, which I did on P5980. I only tried the equivalent of the actual page, which corresponds to case 4 in my previous comment. I created a stripped down version of it.

It creates two sections: 1 lead + the foo section; the references (aka bar section text is contained inside the foo section, which is similar to how action=mobileview behaves.

In terms of document structure, the behavior in line two (add section around <div>-wrapped heading) seems to make sense. I think it also matches edit section behavior, which should ignore the <div> completely (as it is not DOM-based).

which should ignore the <div> completely (as it is not DOM-based).

Sorry, I'm not sure I follow. What do you mean when you say a <div> is not DOM-based? The <div> is an HTML element and by definition part of the DOM.

GWicke added a comment.EditedMon, Sep 11, 7:12 PM

@bearND, MediaWiki's section edit feature is implemented without knowledge of a DOM, so <div> wrappers do not suppress edit sections. Example: https://en.wikipedia.org/wiki/User:GWicke/TestSections with source

== First h2. ==
Some content for the first heading.
<div>
== Second h2, embedded in div. ==
Some content for the second heading.
</div>
== Third h2, not embedded in div. ==
Some content for the third heading.

Arguably, this behavior usually matches user's expectations by making each top-level section editable and addressable. I think it is worth matching this in the new section implementation, which is why I went with that option in parsoid-dom-utils.

Change 377349 had a related patch set uploaded (by BearND; owner: BearND):
[mediawiki/services/mobileapps@master] Bring back old section handling code

https://gerrit.wikimedia.org/r/377349

Change 377349 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Bring back old section handling code

https://gerrit.wikimedia.org/r/377349

In the patch ^ I've changed the parsoid-access code to use the old code for sectioning via <div> tags instead of <section> tags. It'll automatically take advantage of <section> tags once Parsoid produces them. This should give us some time to come up with a proper solution if T114072 is taking too long.

bearND updated the task description. (Show Details)Tue, Sep 12, 8:33 PM
Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2017-09-12T21:55:45Z] <bsitzmann@tin> Started deploy [mobileapps/deploy@b11b75c]: Update mobileapps to 297b048 (T174707 T175305)

Mentioned in SAL (#wikimedia-operations) [2017-09-12T22:00:13Z] <bsitzmann@tin> Finished deploy [mobileapps/deploy@b11b75c]: Update mobileapps to 297b048 (T174707 T175305) (duration: 04m 28s)

I finally got @ssastry 's Parsoid patch tested with MCS. With that I see similar behavior in MCS in that is also generated an extra empty section at the end with just an id after avoiding the 'indexOf' of undefined exception, like

http://localhost:6927/fr.wikipedia.org/v1/page/mobile-sections/Portail:Togo/140589478

"remaining": {
  "sections": [
    {
      "id": 1
    }
  ]
}