Page MenuHomePhabricator

Figure handler rejects nested tables in figure captions
Closed, ResolvedPublic

Description

Non-nested table

[subbu@earth parsoid] echo "[[File:Foo.jpg|thumb|180px|Foo<br/>\n{|\n|x\n|}\n]]" | parse.js --normalize
<figure><a href="File:Foo.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="118" width="180"/></a><figcaption>Foo<br/>
<table>
<tbody>
<tr>
<td>x</td>
</tr>
</tbody>
</table>
</figcaption></figure>

Nested table (the table is missing in the output)

[subbu@earth parsoid] echo "[[File:Foo.jpg|thumb|180px|Foo<br/>\n{|\n|\n{|\n|z\n|}\n|}\n]]" | parse.js --normalize
<figure><a href="File:Foo.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="118" width="180"/></a><figcaption>} </figcaption></figure>

----Original bug report ----
When an image caption contains a wikitable, the linter may flag bogus-image-options while everything on the page looks "as intended".

Examples:

There is nothing to fix, so these cases are false positives.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 15 2017, 4:11 PM
Bdijkstra renamed this task from LintError bogus-image-options false positves with wikitable syntax to LintError bogus-image-options false positives with wikitable syntax.Oct 15 2017, 4:12 PM
Bdijkstra updated the task description. (Show Details)

Parsoid is having trouble recognizing nested tables in figure captions. This is a tokenizer problem and not a linter issue per se.

[subbu@earth parsoid] echo "[[File:Foo.jpg|thumb|180px|Foo<br/>\n{|\n|x\n|}\n]]" | parse.js --normalize --lint
<figure><a href="File:Foo.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="118" width="180"/></a><figcaption>Foo<br/>
<table>
<tbody>
<tr>
<td>x</td>
</tr>
</tbody>
</table>
</figcaption></figure>
[subbu@earth parsoid] echo "[[File:Foo.jpg|thumb|180px|Foo<br/>\n{|\n|\n{|\n|z\n|}\n|}\n]]" | parse.js --normalize --lint
<figure><a href="File:Foo.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="118" width="180"/></a><figcaption>} </figcaption></figure>
[info/lint/bogus-image-options][enwiki/Main Page] {"type":"bogus-image-options","params":{"items":["Foo<br/>\n{|\n|\n{|\n|z\n|}\n"]},"dsr":[0,55,2,2]}
ssastry renamed this task from LintError bogus-image-options false positives with wikitable syntax to Figure handler rejects nested tables in figure captions.Oct 15 2017, 6:27 PM
ssastry triaged this task as Medium priority.
ssastry edited projects, added Parsoid; removed MediaWiki-extensions-Linter.
ssastry updated the task description. (Show Details)

I thought this was not a parser issue as I saw no differences in the parsermigration tool. Could someone please explain this for my understanding?

I thought this was not a parser issue as I saw no differences in the parsermigration tool. Could someone please explain this for my understanding?

Parsermigration uses output from the PHP parser and fixes HTML with Tidy and Remex and those are two versions you see.

Linter warnings are not based on PHP parser -- they use Parsoid's analysis. So, if Parsoid and PHP parser differ, then you can get false positives reported compared to what you see in the parsermigration tool.

Arlolra claimed this task.Nov 6 2017, 7:59 PM

Change 389553 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] T178253: Parse nested tables in figure captions

https://gerrit.wikimedia.org/r/389553

Change 389553 merged by jenkins-bot:
[mediawiki/services/parsoid@master] T178253: Parse nested tables in figure captions

https://gerrit.wikimedia.org/r/389553

Arlolra closed this task as Resolved.Nov 6 2017, 10:19 PM

Mentioned in SAL (#wikimedia-operations) [2017-11-09T18:41:16Z] <arlolra> Updated Parsoid to 2887b5ad (T178253, T173643, T176728, T180010, T171381, T179757)

Arlolra reopened this task as Open.Nov 9 2017, 6:48 PM

Hmm, so after purging,

https://nl.wikipedia.org/api/rest_v1/page/html/Heuvelland_(Zuid-Limburg)

is fixed.

But,

https://nl.wikipedia.org/api/rest_v1/page/html/Klimaatclassificatie_van_K%C3%B6ppen

is not. Same locally so looks like there's some more work to do here.

Change 390348 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] T178253: Handle pipe ending table attributes in figure captions

https://gerrit.wikimedia.org/r/390348

Change 390348 merged by jenkins-bot:
[mediawiki/services/parsoid@master] T178253: Handle pipe ending table attributes in figure captions

https://gerrit.wikimedia.org/r/390348

Arlolra closed this task as Resolved.Nov 13 2017, 11:25 PM

Mentioned in SAL (#wikimedia-operations) [2017-11-14T18:32:12Z] <arlolra> Updated Parsoid to e71937d0 (T178253)

Change 391360 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] T178253: Permit table rows from templates in figure captions

https://gerrit.wikimedia.org/r/391360

Thanks!

The nested table there looks like this,

[[Файл:Orbital Launches.svg|thumb|right|250px|Космические державы
{| style="width:100%;" |- | valign=top |
{{legend|#ff3232|Признанные запуски РН}}
{{legend|#3200ff|Неподтверждённые запуски РН}}
{{legend|#008000|Планирующиеся запуски РН}}
|}
]]

which expands to,

{| style="width:100%;" |- | valign=top |
<div class="legend" style="-webkit-column-break-inside: avoid;page-break-inside: avoid;break-inside: avoid-column"><span class="legend-color" style="display:inline-block; width:1.5em; height:1.5em; margin:1px 0; border:1px solid black; background-color:
#ff3232; color:black; font-size:100%; text-align:center;">&nbsp;</span>&nbsp;Признанные запуски РН</div>
<div class="legend" style="-webkit-column-break-inside: avoid;page-break-inside: avoid;break-inside: avoid-column"><span class="legend-color" style="display:inline-block; width:1.5em; height:1.5em; margin:1px 0; border:1px solid black; background-color:
#3200ff; color:black; font-size:100%; text-align:center;">&nbsp;</span>&nbsp;Неподтверждённые запуски РН</div>
<div class="legend" style="-webkit-column-break-inside: avoid;page-break-inside: avoid;break-inside: avoid-column"><span class="legend-color" style="display:inline-block; width:1.5em; height:1.5em; margin:1px 0; border:1px solid black; background-color:
#008000; color:black; font-size:100%; text-align:center;">&nbsp;</span>&nbsp;Планирующиеся запуски РН</div>
|}

which is a bunch of fosterable content. However, there's still something to fix here as seen in the reduced test case in the above patch.

Change 391360 merged by jenkins-bot:
[mediawiki/services/parsoid@master] T178253: Permit table rows from templates in figure captions

https://gerrit.wikimedia.org/r/391360