Page MenuHomePhabricator

Splitting a paragraph duplicated it
Closed, ResolvedPublic

Description

See https://meta.wikimedia.org/w/index.php?title=Community_Engagement_Insights/2018_Report/Communications_Department&diff=18339687&oldid=18339625 The next diff shows what should have happened. This was all very simple editing: Place cursor in the middle of a paragraph, press Return, add list formatting, remove a few words.

Event Timeline

This looks like a Parsoid/RESTBase issue?

This is still reproducible on that page.

Looks like we send HTML like this to Parsoid:

<p id="mwUg">In examining this question by gender, we can observe some differences. We cannot say whether they are significant. When asked about using media channels for learning about features and services from the Wikimedia Foundation we observed the following: </p><ul><li><p id="mwUg">68% of males reported using at least one channel. 80% of females reported using at least one channel.</p></li><li><p id="mwUg">A higher proportion of males used Wikimedia projects pages. Female editors reported a higher use of mailing lists, social media, the Wikimedia Foundation blog, and conferences.</p></li></ul>

Note how all of the paragraphs have the same id="mwUg" attribute.

I don't know whether this is a VE bug or a Parsoid bug.

I think this is deliberate when we split nodes, so that wikitext formatting is preserved (not an issue for paragraphs, but for headings/table cells/lists etc.)

JTannerWMF added subscribers: ssastry, JTannerWMF.

We need support from Parsoid to know how to proceed. tagging @ssastry

LGoto triaged this task as Low priority.Apr 3 2020, 4:30 PM
LGoto raised the priority of this task from Low to Medium.
LGoto moved this task from Backlog to Needs Investigation on the Parsoid board.

@Esanders I think the data-parsoid/data-mw is intended to never affect visual rendering, so there should be no reason to duplicate the id attributes on the copied content. For cut-and-paste sure preserve the original ID, but future pastes should assign new IDs. (Or just do a pass over the output before you send it to parsoid and delete all but the first appearance of a given id.)

There are probably corner cases (headings were mentioned) where for some reason copying the ID gives better behavior; we should add additional *non-data-parsoid*/*non-data-mw* attributes to control that behavior.

I cannot reproduce this bug right now. I tried on mediawiki.org and on meta on the same page (and accidentally even saved one of my test edits). So, either this was some peculiar edge case on the page or this has been resolved in the interim because of changes in VE or Parsoid.

My attempt to reproduce:

>> ve.init.target.doc.body.innerHTML
"<p id=\"mwAg\">This is a paragraph.  This is sentence two of the paragraph.</p>"
>> ve.init.target.docToSave.body.innerHTML
"<p id=\"mwAg\">This is a paragraph.  </p><ul><li><p id=\"mwAg\">This is sentence </p></li><li><p id=\"mwAg\">of the paragraph.</p></li></ul>"

https://en.wikipedia.org/w/index.php?title=User:Cscott/T203112&type=revision&diff=948895599&oldid=948895212&diffmode=source

That looks fine. I think we must have fixed this sometime since this bug was filed.

ssastry claimed this task.

This was probably some edge case. If you look closely at the diff, there were some dirty diffs of the gallery as well which indicates that something else was going on. We cannot reproduce this, so just closing it out for now. If this were a common bug, we would have lots of reports and easy ways to reproduce it.