Page MenuHomePhabricator

Unclosed article tag results in nonlocal dirty diff (that selser doesn't fix)
Closed, ResolvedPublic

Description

This wikitext:

:<article>
::foo

Becomes this, no matter where it is in relation to changed content:

:<article>

:foo

Selser doesn't mitigate this at all. See https://parsoid-prod.wmflabs.org/_rtselser/enwiki/Wikipedia:Sandbox?oldid=644474603 and https://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&oldid=644474603 for a live example.

Event Timeline

Jackmcbarn raised the priority of this task from to Medium.
Jackmcbarn updated the task description. (Show Details)
Jackmcbarn added a project: Parsoid.
Jackmcbarn subscribed.
ssastry set Security to None.

This seems a bug in our HTML output that emits a bare ":" at start of line as in this output below.

[subbu@earth lib] echo ":<span>a\n::b" | node parse --fetchConfig false --normalize
[warning][enwiki/Main Page] DSR inconsistency: cs/s mismatch for node: SPAN s: 7 ; cs: 8

<dl>
<dd><span>a</span></dd>
</dl>
<p>:b</p>

Normally selser would have let this through, but as of https://gerrit.wikimedia.org/r/#/c/180032/ we now examine all lines of output for any missing nowiki escapes because of edits elsewhere on the line. But, I think we are being overeager here about how we apply wt-escaping logic to selser output. So, there are two issues here:

  • Problem with our wt2html output in handling unclosed tags which is throwing off our list building logic
  • Problem with our wikitext escaping logic and how we apply it.

Looks like both of these issues identified above are regressions caused by other fixes.

To clarify, I am not sure we should fix the selser / nowiki issue necessarily. The nowiki insertion is sane and correct and our old selser behavior was basically helping us by hiding parser bugs that let through wikitext constructs unparsed.

Change 192615 had a related patch set uploaded (by Arlolra):
Open tags only affect line when parsing definition list colon

https://gerrit.wikimedia.org/r/192615

Patch-For-Review

Change 192615 merged by jenkins-bot:
Open tags only affect line when parsing definition list colon

https://gerrit.wikimedia.org/r/192615

The patch gives us,

λ (master) echo -e ":<span>a\n::b" | node parse --fetchConfig false --normalize

<dl>
<dd><span>a
<dl>
<dd>b</dd>
</dl>
</span></dd>
</dl>

What was the conclusion of the nowiki discussion? Doesn't sound like you wanted to do anything, so safe to close?

I don't see how the chunk-based serializer is related here? Unless we think we could use the chunk-based serializer to be a little smarter about serialization?

It is related because it is now examining every line for nowiki escaping, even if there were no edits to that section of the document. The earlier serializer did not. While I can argue that it is probably a good thing to do by exposing parsing bugs, it also has the potential of introducing nowikis in unedited sections of the document. So, I was trying to find out how you approached this.

Opened T91569 for the selser issue subbu raised.

With that forked off, it is safe to close this bug.