Page MenuHomePhabricator

Edge case parsing indented colon-prefixed table: possibly treat this as undefined behavior and lint the wikitext pattern away
Open, LowPublic

Description

Originated from: https://github.com/openzim/mwoffliner/issues/359

We're finding that some WikiTables are returned as source, rather than parsed.
An example is here:
http://en.wikipedia.org/api/rest_v1/page/mobile-sections/Molecular_geometry

The returned section with ID 7 contains an un-parsed wikitable (:{| class="wikitable"...) along with multiple properly parsed tables.

Could be that there is no whitespace before it?

Event Timeline

Reducible to the following edge case:

[subbu@earth:~/work/wmf/parsoid] cat /tmp/wt
  :{|
  |foo
  |}

  {|
  |bar
  |}

:{|
  |bar
  |}

:x

  :x

[subbu@earth:~/work/wmf/parsoid] php ../mediawiki/maintenance/parse.php < /tmp/wt
<dl><dd><table>
<tbody><tr>
<td>foo
</td></tr></tbody></table></dd></dl>
<table>
<tbody><tr>
<td>bar
</td></tr></tbody></table>
<dl><dd><table>
<tbody><tr>
<td>bar
</td></tr></tbody></table></dd></dl>
<dl><dd>x</dd></dl>
<pre> :x
</pre>

[subbu@earth:~/work/wmf/parsoid] parse.js --normalize < /tmp/wt
<pre> :{|
 |foo
 |}</pre>
<table>
<tbody>
<tr>
<td>bar</td>
</tr>
</tbody>
</table>
<dl>
<dd>
<table>
<tbody>
<tr>
<td>bar</td>
</tr>
</tbody>
</table>
</dd>
</dl>
<dl>
<dd>x</dd>
</dl>
<pre> :x</pre>

So, an indented table parses just fine and a colon-prefixed table parses just fine, but, an indented colon-prefixed table is not parsed as a table in Parsoid. The PHP parser behavior is somewhat odd given that you cannot space-indent a colon normally. So, it breaks that behavior for tables? I am inclined to just fix the wikitext and treat this as unsupported in Parsoid. We could probably lint this wikitext in the corpus and have editors fix this edge case.

ssastry renamed this task from Not parsing all WikiTables to Edge case parsing indented colon-prefixed table: possibly treat this as undefined behavior and lint the wikitext pattern away.Nov 9 2018, 3:19 PM
ssastry triaged this task as Medium priority.
ssastry lowered the priority of this task from Medium to Low.
ssastry edited projects, added Parsoid-Linter; removed Parsoid.

Thanks for the quick respons @ssastry - I'll update the MWOffliner issue.