Page MenuHomePhabricator

VE should strip element ids from HTML that it generates when wikitext is pasted
Closed, ResolvedPublic8 Story Points

Description

That diff points some HTML elements I've removed from a previous edit I've made (previous diff is bigger and less readable).

Process: While using VE to edit, I've opened a template and copied some wikitext inside of it. I've pasting it where I needed to. It has been converted and I've then saved the page.

May be related to T145211: Headings occasionally serialised as <h2> elements rather than \n\n== Foo ==\n in Parsoid. Config: Firefox 48 on Xubuntu 16.04 LTS.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 19 2016, 4:58 PM
ssastry added a subscriber: ssastry.

This looks like a VE issue with element ids not being cleared after copy-paste - I've added those projects.

@Trizek-WMF can you add info about which template you opened on the page? That can help with reproduction and debugging.

@Trizek-WMF can you add info about which template you opened on the page? That can help with reproduction and debugging.

https://fr.wikipedia.org/wiki/Mod%C3%A8le:Colonnes
I've done the following:

  1. edit the article using VE
  2. opened that template
  3. copied some wikitext which was inside of it.
  4. pasted that text into the "Articles connexes" section - which looks fine after conversion
  5. saved the page.

I was able to reproduce the bug by editing that oldid in VE.

I dumped the edited DOM in VE by following these directions.

I looked at the HTML for the snippet in question:

<ul id="mwAQ"><li id="mwAg"><a title="Infrastructure ferroviaire" id="mwAw" rel="mw:WikiLink" href="./Infrastructure_ferroviaire">Infrastructure ferroviaire</a></li><li id="mwBA"><a title="Chemin de fer" id="mwBQ" rel="mw:WikiLink" href="./Chemin_de_fer">Chemin de fer</a></li><li id="mwBg"><a title="Gare" id="mwBw" rel="mw:WikiLink" href="./Gare">Gare</a></li></ul>

And, the bug is there in front of us. VE is preserving the ids generated by parsing the pasted wikitext. But, this is a problem since those ids could conflict with ids on existing elements on the DOM => you would cause Parsoid to reuse data-parsoid from unrelated elements.

The fix is for VE to strip element ids from HTML generated when wikitext is copy-pasted.

ssastry renamed this task from Wikitext copyed and pasted on VE is serialised as HTML rather than wikitext elements in Parsoid to VE should strip element ids from HTML that it generates when wikitext is pasted.Sep 20 2016, 5:12 PM
ssastry triaged this task as High priority.
ssastry removed a project: Parsoid.
Jdforrester-WMF set the point value for this task to 1.
Jdforrester-WMF moved this task from To Triage to TR0: Interrupt on the VisualEditor board.

Change 312170 had a related patch set uploaded (by Alex Monk):
Strip element IDs from HTML generated when wikitext is pasted

https://gerrit.wikimedia.org/r/312170

Change 312170 abandoned by Alex Monk:
Strip element IDs from HTML generated when wikitext is pasted

Reason:
Yeah, this should actually be handled by someone who knows what they're doing in this part of the code. That's not me.

https://gerrit.wikimedia.org/r/312170

If we strip IDs then we break references. We need to be able to imported the generated HTML from "<ref>Foo</ref>".

Change 312170 restored by Esanders:
Strip element IDs from HTML generated when wikitext is pasted

https://gerrit.wikimedia.org/r/312170

Change 312170 merged by jenkins-bot:
Strip RESTBase IDs from HTML generated when wikitext is pasted

https://gerrit.wikimedia.org/r/312170