Parsoid: (Italian versions of) Template:infobox_person messing up with Template:Sister?
Closed, ResolvedPublic

bzimport added a project: Parsoid.Via ConduitNov 22 2014, 1:49 AM
bzimport set Reference to bz51678.
Elitre created this task.Via LegacyJul 19 2013, 11:57 AM
Jdforrester-WMF added a comment.Via ConduitJul 24 2013, 5:46 AM

This looks like a nasty Parsoid bug that I think was fixed last week, where sometimes some templates would get duplicated when re-used. I'm going to tentatively mark this as fixed, but please re-open if it happens again.

Esanders added a comment.Via ConduitJul 30 2013, 12:40 PM

I can reproduce on it.wiki, but pasting the posted HTML into my local Parsoid it looks fine. How up-to-date is the deployment on it.wiki?

Esanders added a comment.Via ConduitJul 30 2013, 12:42 PM

Either way, looks like a Parsoid bug.

Elitre added a comment.Via ConduitAug 1 2013, 1:58 PM

Thanks Ed. FYI, we have found out (sorry, no diff available, the text was a copyvio) that the very same corruption also appears if you don't edit the Template:Bio at all but just change a word in the page (which was Pierre de Fermat on it.wp, featuring both of the templates).

ssastry added a comment.Via ConduitAug 1 2013, 2:53 PM

Thanks Elitre. Will take a look today.

ssastry added a comment.Via ConduitAug 2 2013, 4:25 AM

This is a baffling bug. When I take html dumps (after edits) from chrome and serialize it a couple different ways, I dont see the diff at all. But, when I click on 'Review changes' button, the diff shows up. Tried on http://it.wikipedia.org/wiki/Szil%C3%A1rd_Ign%C3%A1c_Bogd%C3%A1nffy

Will investigate more tomorrow.

ssastry added a comment.Via ConduitAug 2 2013, 5:22 PM

My testing was a little off late y'day night. But, here is what is going on:

  1. Since Parsoid doesn't yet provide a PHP compatible API (bug 48483 tracks this), on template edits, VE fetches new HTML from the mediawiki API whose output differs from Parsoid's, most notably for categories .. Parsoid emits <link..> tags and php parser leaves no trace of them. So, there is a big chunk of missing output when VE sends Parsoid back this DOM. Normally, this shouldn't be an issue since Parsoid simply hops over template html and should not even run into this hole.
  1. But, Parsoid's DOM-diff has a big where it occasionally descends into transclusion HTML and trips itself up and throws off the dom-diff and inserts a dom-diff deletion marker further downstream in a later template and breaks template continuity which then breaks the serializer.

I'll fix the bug in the dom-diff algorithm (and possibly make the serializer more robust against future dom-diff bugs to ignore deletion markers). But, we should try and implement 48483 sooner than later so we dont run into other issues because of differing HTML issues.

gerritbot added a comment.Via ConduitAug 2 2013, 5:50 PM

Change 77357 had a related patch set uploaded by Subramanya Sastry:
(Bug 51678) Fixed bug in dom-diff algorithm

https://gerrit.wikimedia.org/r/77357

gerritbot added a comment.Via ConduitAug 2 2013, 8:44 PM

Change 77357 merged by jenkins-bot:
(Bug 51678) Fixed bug in dom-diff algorithm

https://gerrit.wikimedia.org/r/77357

Elitre added a comment.Via ConduitAug 14 2013, 3:49 PM

Will this be added to tomorrow's deployment?

ssastry added a comment.Via ConduitAug 14 2013, 3:51 PM

Parsoid is deployed independently, but yes, on next Parsoid update (by tomorrow), this fix will go out.

Elitre added a comment.Via ConduitAug 14 2013, 3:59 PM

Thanks :)

ssastry added a comment.Via ConduitAug 14 2013, 10:52 PM

Now deploy. Please verify and close if fixed.

Elitre added a comment.Via ConduitAug 21 2013, 6:16 PM

Yes, great, thank you!!!

Jdforrester-WMF added a comment.Via ConduitFeb 4 2014, 3:53 AM

That's not related.

Elitre added a comment.Via ConduitFeb 5 2014, 2:25 PM

Ok, now at bug 60897 then. Thanks.

Add Comment