Page MenuHomePhabricator

Parser tests does not cover the "indented table" case
Closed, ResolvedPublic

Description

The legacy parser contains a special case to handle tables prefixed with colons:

In Parser::handleTables:

			if ( preg_match( '/^(:*)\s*\{\|(.*)$/', $line, $matches ) ) {
				# First check if we are starting a new table
				$indent_level = strlen( $matches[1] );

				$attributes = $this->mStripState->unstripBoth( $matches[2] );
				$attributes = Sanitizer::fixTagAttributes( $attributes, 'table' );

				$outLine = str_repeat( '<dl><dd>', $indent_level ) . "<table{$attributes}>";

There don't seem to be any parser tests for this case. Parsoid almost certainly handles this differently than the legacy parser does (but w/o tests I don't know exactly how they differ).

Event Timeline

Are you looking for all the tests with "Definition Lists: Hacky use to indent tables" prefix?

Also, see hacky_dl_uses in the grammar

@Arlolra thanks! I guess the regexp search I was using in parserTests.txt wasn't correct. We don't seem to cover the case the DiscussionTools team was interested in:

: parent
:: here is a comment
:: {|
|foo
|bar
|}

As I understand it (need to actually write the test case), the legacy parser closes the parent list context and then opens a brand new list for the indented comment, instead of including the table in the list containing 'parent' and 'here is a comment'. I don't know if Parsoid matches that behavior or not.

(Of course, eventually we'd like something like:

: parent
:: <<< here is a comment
{|
| foo
| bar
|} >>>

which would unambiguously include the table in the list item with 'here is a comment' and nested properly under 'parent'.)

@matmarex notes that the way the legacy parser handles indented tables also breaks CSS style for comments on frwiki.

Parsoid does the expected thing here and yes, the core parser doesn't.

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php < /tmp/dlwt 
<dl data-parsoid='{"dsr":[0,48,0,0]}'><dd data-parsoid='{"dsr":[0,48,1,0]}'>parent
<dl data-parsoid='{"dsr":[9,48,0,0]}'><dd data-parsoid='{"dsr":[9,29,2,0]}'>here is a comment</dd>
<dd data-parsoid='{"dsr":[30,48,2,0]}'><table data-parsoid='{"dsr":[33,48,2,2]}'>
<tbody data-parsoid='{"dsr":[36,46,0,0]}'><tr data-parsoid='{"autoInsertedStart":true,"dsr":[36,45,0,0]}'><td data-parsoid='{"dsr":[36,40,1,0]}'>foo</td>
<td data-parsoid='{"dsr":[41,45,1,0]}'>bar</td></tr>
</tbody></table></dd></dl></dd></dl>

[subbu@earth:~/work/wmf/parsoid] php ../mediawiki/maintenance/parse.php < /tmp/dlwt 
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<dl><dd>parent
<dl><dd>here is a comment</dd></dl></dd></dl>
<dl><dd><dl><dd><table>
<tbody><tr>
<td>foo
</td>
<td>bar
</td></tr></tbody></table></dd></dl></dd></dl>

@Esanders, @matmarex : this difference in behavior might be relevant to your comment-parsing code. If you parse the page using the legacy parser output, you'll end up with a different list structure when you re-parse it using the parsoid output.

DiscussionTools should be fine with that, the effect here is basically the same as having a "list gap" and it deals fine with that.

Change 627976 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Add test for hacky-indented-table inside a list

https://gerrit.wikimedia.org/r/627976

Change 627976 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add test for hacky-indented-table inside a list

https://gerrit.wikimedia.org/r/627976

ssastry assigned this task to cscott.
ssastry triaged this task as Low priority.

Change 628944 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a10

https://gerrit.wikimedia.org/r/628944