Quotes and whitespace normalized in unedited <ref> tags
Closed, ResolvedPublic

Description

Filing this since subbu wanted to take a closer look ("Yes, VE/Parsoid shouldn't be making dirty diffs -- and these kind of dirty diffs is an exception rather than the rule. This is a bug."):

<<VE seems to make unnecessary changes. I wouldn't mind so much if it always helped to maintain the layout of invisible elements, but normally it just screws them (like infoboxes, or position of templates), without any effect on the rendered page but with a less easily readable wikitext editing window.
What I notice now though are totally unnecessary and hard to explain changes to ref names. For some reason, VE decides that all ref names need to be in quotes, which is not true at all and doesn't help one bit. It also decides that between the final quote and the forward slash, there needs to be a space. All this has no effect on the rendered page, and doesn't make the wikitext page any easier (or harder) to read, so why does VE bother doing this, which only helps to make diffs a lot longer.
Take a look at e.g. [10]: can you easily see if any significant change has been made, and what that may be?
First difference; <ref name="globsec-arjun"/> becomes
<ref name="globsec-arjun" /> Further: <ref name=nomorearjuns> becomes <ref name="nomorearjuns">
and so on. I have not found the actual significant difference between the old and new version. What is the purpose and benefit of this? If VE would always make the "invisible" layout clearer and better, fine, but VE doesn't care one bit about wikitext layout, so I don't get this useless editing. Fram (talk) 09:27, 7 October 2013 (UTC)>>

https://en.wikipedia.org/w/index.php?title=Arjun_%28tank%29&curid=7648336&diff=576092971&oldid=575064244


Version: unspecified
Severity: normal

bzimport added a project: Parsoid-Serializer.Via ConduitNov 22 2014, 2:15 AM
bzimport set Reference to bz55461.
Elitre created this task.Via LegacyOct 8 2013, 8:53 AM
Catrope added a comment.Via ConduitOct 8 2013, 9:02 AM

Yup, that's a bug. VE should not be normalizing <ref name="globsec-arjun"/> to <ref name="globsec-arjun" /> nor should it be normalizing <ref name=nomorearjuns> to <ref name="nomorearjuns">.

Since the revision wasn't tagged with the visualeditor-check tag, VE didn't detect any corruption on its end, which makes me suspect that this is a Parsoid bug.

Catrope added a comment.Via ConduitNov 6 2013, 1:40 AM

This seems to be related to bugs in selser (Parsoid) when a cache URL is set. I've set a cache URL locally and will investigate.

ssastry added a comment.Via ConduitNov 6 2013, 5:44 AM

Gabriel and I now think this happened when Parsoid incurred a cache miss for the HTML (rare) and also suffered a timeout on reparsing (possible on large pages since we had a timeout of 16 sec). But, a couple commits today should address this and further reduce the likelihood of selser not getting triggered on the edit.

The relevant commits are:

  1. https://gerrit.wikimedia.org/r/#/c/93896/ (increases cache timeout to 60 sec)
  2. https://gerrit.wikimedia.org/r/#/c/93902/ (improves parse time when selser suffers a cache miss fetching original HTML).
bzimport added a comment.Via ConduitNov 7 2013, 12:32 PM

Wikifram wrote:

It seems that quotes are still being added to ref tags as of today: [https://en.wikipedia.org/w/index.php?title=Sting_%28wrestler%29&curid=655575&diff=580574777&oldid=580516764]

ssastry added a comment.Via ConduitNov 7 2013, 2:54 PM

The new code I mentioned in #c3 hasn't been deployed to production yet. Will update this ticket when that happens.

ssastry added a comment.Via ConduitNov 8 2013, 6:39 AM

New code has been deployed with a bunch of fixes related to cache misses that would cause Parsoid to run the regular serializer on the entire dom. This should fix this issue on most pages.

We'll also make some fixes to minimize dirty diffs even when there is a cache miss -- we've identified some improvements we could make to the DOM diff algorithm.

bzimport added a comment.Via ConduitNov 12 2013, 1:53 PM

Wikifram wrote:

...still happens apparently: [https://en.wikipedia.org/w/index.php?title=New_%28album%29&diff=581113730&oldid=581109664] (noted by another editor on [[en:WP:VEF]])

bzimport added a comment.Via ConduitNov 13 2013, 9:17 AM

Wikifram wrote:

Not just the quotes, also the additional spaces: [https://en.wikipedia.org/w/index.php?title=Gray_wolf&curid=33702&diff=581450926&oldid=581450732] I can't judge whether this is fixed on most pages or not, but it certainly seems to happen quite reguilarly still.

ssastry added a comment.Via ConduitNov 15 2013, 12:51 AM

Yes, those were probably from cache misses. We deployed newer code y'day that should eliminate these unrelated normalizations in most cases (barring bugs).

Note however that if a <ref> is edited, Parsoid will still normalize the quotes and whitespace in the attributes of the edited ref.

Let us wait for a week to see if we still see unrelaed normalizations and if not, close this as resolved.

ssastry added a comment.Via ConduitNov 18 2013, 3:24 PM

This last diff seems to have happened in the context of some error the editor encountered. Recording here for further debugging.

https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=581952520#Error:Unknown_error

GWicke added a comment.Via ConduitNov 19 2013, 1:14 AM

The diff in comment #10 might have been caused by an unrelated issue with one of the Varnish caches in production.

GWicke added a comment.Via ConduitDec 3 2013, 1:09 AM

I believe this was fixed by 67fca5bdc7 and 923c79102a.

These test cases from comments above now round-trip fine:

Please reopen if this problem still exists.

bzimport added a comment.Via ConduitDec 3 2013, 3:01 PM

Wikifram wrote:

I've reopened this, since it appears that this still happens: [https://en.wikipedia.org/w/index.php?title=Triceratops&curid=54410&diff=584366006&oldid=584365568]

ssastry added a comment.Via ConduitApr 21 2014, 9:42 PM
  • Bug 64197 has been marked as a duplicate of this bug. ***
ssastry added a comment.Via ConduitApr 21 2014, 9:42 PM
  • Bug 64162 has been marked as a duplicate of this bug. ***
Whatamidoing-WMF added a comment.Via ConduitApr 22 2014, 2:58 AM

This is just a note that Bug 64197, merged here, is about {{sfn}} templates, which sort of are and sort of aren't ref tags. See https://en.wikipedia.org/w/index.php?title=Jack_Parsons_%28rocket_engineer%29&diff=604788783&oldid=604787564

Whatamidoing-WMF added a comment.Via ConduitApr 23 2014, 2:21 AM

Here's another diff that is probably the same bug:

https://en.wikipedia.org/w/index.php?title=Rights&curid=51490&diff=603988349&oldid=601154438

In this one, an oddly named-but-unnamed ref tag (<ref name="">) has its whitespace normalized.

cscott added a comment.Via ConduitJul 14 2014, 8:34 PM

Another one, found while looking at diffs after the deploy of d51e64097:
https://fr.wikipedia.org/w/index.php?title=Cendrillon_(Disney)&curid=3040885&diff=105440424&oldid=104520814

Whatamidoing-WMF added a comment.Via ConduitOct 14 2014, 6:24 PM

This is still happening regularly. Another example is at https://en.wikipedia.org/w/index.php?diff=629369061

Arlolra placed this task up for grabs.Via WebNov 25 2014, 7:00 PM
Arlolra added a project: Parsoid.
Arlolra set Security to None.
ssastry closed this task as "Resolved".Via WebDec 17 2014, 10:22 PM
ssastry claimed this task.

Closing this task since there is nothing much more to be done here since we cannot protect against varnish cache failures -- these should be a fairly rare occurrence. Once we move to RESTbase, this should not happen anymore.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.