Page MenuHomePhabricator

Investigate whitespace changes elsewhere on the page (dirty diffs)
Open, MediumPublic

Description

Because the whole page is being validated by Parsoid, sometimes making a reply at one point on the page causes changes elsewhere on the page. These should be investigated to make sure that they are appropriate, proportionate, and consistent.

Whitespace changes that were made

  1. https://fr.wikipedia.org/w/index.php?curid=3478951&diff=169022495&oldid=169019977 – removes extraneous space in div tag
  2. https://nl.wikipedia.org/w/index.php?curid=500939&diff=55990506&oldid=55990470 – adds blank line before the start of the list (same list that the reply is in)
  3. https://ar.wikipedia.org/w/index.php?curid=472301&diff=45591958&oldid=45591951 – removes extra div tag
  4. https://nl.wikipedia.org/w/index.php?curid=653617&diff=55994380&oldid=55994325 – removes space at end of line
  5. https://hu.wikipedia.org/w/index.php?curid=258186&diff=22425221&oldid=22425218 – adds blank line. This one matters more than the others, because some archive bots also normalize whitespace before and after section headings, and if they're normalizing it to the opposite style, then the Reply tool will be edit-warring with a bot, which never ends well.

Whitespace changes that were not made

  1. https://fr.wikipedia.org/w/index.php?title=Discussion_Projet:Aide_et_accueil/Twitch/Sessions/1&curid=13210813&diff=169041994&oldid=169041030&diffmode=source – In the previous comment, there is a space between the list markup and the comment. Why wasn't it removed?
  2. https://fr.wikipedia.org/w/index.php?title=Discussion_mod%C3%A8le:Infobox_Pand%C3%A9mie&curid=13191551&diff=169260622&oldid=169256883 – One space was removed between the list markup and the start of the comment. Why not one, and not all?
  3. https://ar.wikipedia.org/w/index.php?curid=1939264&diff=45751794&oldid=45751725 – The space was removed between the list markup and the start of the comment in one line, but not in the next line.

Event Timeline

ssastry triaged this task as Medium priority.Apr 16 2020, 3:08 AM
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.
cscott subscribed.

Moving over to needs investigation, since I think we need to look into the list above and probably create subtasks for specific issues.

  1. https://hu.wikipedia.org/w/index.php?curid=258186&diff=22425221&oldid=22425218 – adds blank line. This one matters more than the others, because some archive bots also normalize whitespace before and after section headings, and if they're normalizing it to the opposite style, then the Reply tool will be edit-warring with a bot, which never ends well.

This is T249958: Extra line being added elsewhere on page and I've uploaded a patch that fixes it.

  1. https://fr.wikipedia.org/w/index.php?curid=3478951&diff=169022495&oldid=169019977 – removes extraneous space in div tag

I think this is an acceptable normalization - it should be rare in any case. Avoiding this will require a lot of unnecessary code complexity.

Actually, 1, 3, 4 all look like manifestations of the same behavior and maybe a tad hard to avoid at this point. Parsoid marks all the nodes on the path from an edited leaf node to the root of the DOM as potentially modified since we cannot use the DSR for any subtree rooted in that path to emit original wikitext unmodified. That said, Parsoid does apply selser-based original-wiktext-reuse for all other subtrees hanging off that path to not touch them. So, effectively, all nodes on this modified path will go through regular serialization and will get normalized.

So, some list that was nested in that <div >got edited (because of a reply) and so the <div > gets normalized to a <div>since it is on this modified path. Parsoid does have some smarts to minimize normalization of nodes on this path but I think we added those smarts based on wrapperUnmodifed checks, but I think we currently do that mostly only for tables. For now, I am going to say, unless this kind of diff provides to be a lot more common, we could let these one-off normalizations happen.

If you are following along and verifying my claims, in cases 3, 4 it looks like.the normalized nodes are both roots of subtrees that hang off a node on this modified path. So, at first glance, selser-based original-wikitext-reuse should have applied to those subtrees and avoided these normalizations. But, this is wkitext we are talking about. And, Parsoid has exceptions to handle edit scenarios that mess with it. Specifically, check out this function that was first introduced in the appropriately-titled gerrit patch: Deal with chameleon nodes that change output based on surroundings. That method has some conditions for nested lists (exactly the domain of talk pages and replies). So, I suspect in these cases, we ran afoul of some exception introduced in that method. But, nevertheless, one of us could interrogate those assumptions more carefully to see if they could possibly be refined so that selser is applicable in a larger context.