Page MenuHomePhabricator

TypeError in references endpoint
Open, Needs TriagePublic

Details

Related Gerrit Patches:

Event Timeline

Pchelolo created this task.Jul 30 2019, 9:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 30 2019, 9:46 PM
bearND added a subscriber: bearND.

The example mentioned in above link is

local PCS: http://localhost:6927/en.wikipedia.org/v1/page/references/Maine/908276611
Parsoid: https://en.wikipedia.org/api/rest_v1/page/html/Maine/908276611

Stacktrace points to something in domino:

TypeError: Cannot read property 'matches' of null
at HTMLOListElement.value (./mobileapps/node_modules/domino/lib/Element.js:925:15)
at refListElements.forEach (./mobileapps/lib/transformations/references/extractReferenceLists.js:51:47)
at NodeList.forEach (<anonymous>)
at Object.module.exports [as extractReferenceLists] (./mobileapps/lib/transformations/references/extractReferenceLists.js:49:21)
at buildReferences (./mobileapps/routes/page/references.js:30:43)
at commonEnd (./mobileapps/routes/page/references.js:40:14)
at parsoid.pageHtmlPromiseForReferences.then (./mobileapps/routes/page/references.js:46:9)
at tryCatcher (./mobileapps/node_modules/bluebird/js/release/util.js:16:23)

But the Parsoid output looks a bit strange, too, since it has an empty reflist inside another reflist. (appreviated)

<ol class="mw-references references" data-mw-group="nb" id="mwB_o">
  <li about="#cite_note-8" id="cite_note-8">[...]

    <span class="mw-reflink-text" id="mwCAw">[7]</span></a></sup> Maine (along with <a rel="mw:WikiLink" href="./Louisiana" title="Louisiana" id="mwCA0">Louisiana</a>) is considered a part of the <a rel="mw:WikiLink" href="./Geographical_distribution_of_French_speakers" title="Geographical distribution of French speakers" id="mwCA4">Francophone world</a> and makes up the <a rel="mw:WikiLink" href="./French_language_in_the_United_States" title="French language in the United States" id="mwCA8">largest French-speaking population</a> in the United States.<sup about="#mwt247" class="mw-ref" id="cite_ref-auto1_6-1" rel="dc:references" typeof="mw:Extension/ref" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;attrs&quot;:{&quot;name&quot;:&quot;auto1&quot;}}"><a href="./Maine#cite_note-auto1-6" style="counter-reset: mw-Ref 6;" id="mwCBA"><span class="mw-reflink-text" id="mwCBE">[6]</span></a></sup>
      <div class="mw-references-wrap" typeof="mw:Extension/references" about="#mwt251" data-mw="{&quot;name&quot;:&quot;references&quot;,&quot;attrs&quot;:{&quot;group&quot;:&quot;nb&quot;}}" id="mwCBI">

        <ol class="mw-references references" data-mw-group="nb" id="mwCBM"></ol>

      </div>
    </span>
  </li>
</ol>

Not sure why there's a 2nd <ol class="mw-references references" data-mw-group="nb"></ol> (with a different id, though).

The PCS code doesn't handle the case well when ol.mw-references are nested. I think this should be fixed in Parsoid. I don't see a good reason to nest these.

@bearND is there a way to try/catch around this to prevent the issue until it's fixed in parsoid?

Change 526717 had a related patch set uploaded (by Joewalsh; owner: Joewalsh):
[mediawiki/services/mobileapps@master] Try/catch around .closest call that throws a TypeError when the DOM is malformed

https://gerrit.wikimedia.org/r/526717

The content seems to contain {{#tag:ref|Maine does not ... <references group="nb" /> |group="nb"}}, which results in references in references.

$ echo -e "{{#tag:ref|123 <references group="nb" />|group="nb"}}\n\n<references group="nb" />" | node bin/parse --body_only --normalize=parsoid

<p><sup class="mw-ref" id="cite_ref-1" rel="dc:references" typeof="mw:Transclusion  mw:Extension/ref" data-mw='{"parts":[{"template":{"target":{"wt":"#tag:ref","function":"tag"},"params":{"1":{"wt":"123 &lt;references group=nb />"},"group":{"wt":"nb"}},"i":0}}]}'><a href="./Main_Page#cite_note-1" data-mw-group="nb"><span class="mw-reflink-text">[nb 1]</span></a></sup></p>
<div class="mw-references-wrap" typeof="mw:Extension/references" data-mw='{"name":"references","attrs":{"group":"nb"}}'>
<ol class="mw-references references" data-mw-group="nb">
<li id="cite_note-1"><a href="./Main_Page#cite_ref-1" data-mw-group="nb" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-1" class="mw-reference-text">123
<div class="mw-references-wrap" typeof="mw:Extension/references" data-mw='{"name":"references","attrs":{"group":"nb"}}'>
<ol class="mw-references references" data-mw-group="nb">
<div></div>
</ol>
</div>
</span></li>
</ol>
</div>

I'll see what the legacy parser does but ya, probably a bug in Parsoid to emit that.

Change 526717 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Try/catch around .closest call that throws a TypeError when the DOM is malformed

https://gerrit.wikimedia.org/r/526717