Page MenuHomePhabricator

Can't resolve duplicate-ids issues by editing Parsoid HTML
Closed, ResolvedPublic

Description

Best I can tell, it's not possible to resolve duplicate-ids lint errors by editing Parsoid HTML because underlying duplicate ID isn't exposed in the HTML.

Given the wikitext:

<div id="test">
one
</div>
<div id="test">two</div>

You get the HTML:

<div id="test" data-parsoid="{&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[0,26,15,6]}">
<p id="mwAg" data-parsoid="{&quot;dsr&quot;:[16,19,0,0]}">one</p>
</div>
<div id="mwAw" data-parsoid="{&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[27,51,15,6]}">two</div>

There's no way to know that the second div actually has a duplicate ID and what the duplicate value is. Can it be exposed somehow? Maybe a special data-mw-duplicate-id="..." attribute or I guess just adding it to data-mw itself? (Something that can be in a CSS selector would be ideal) And then I would expect that I could then change the id attribute to set a different ID in the wikitext.

Also I noticed that in some cases, if you convert that HTML back to wikitext, it strips the ID entirely, but it doesn't seem 100%.

Event Timeline

Change #1236383 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Preserve original id in html when reset to store in pb

https://gerrit.wikimedia.org/r/1236383

Change #1236383 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Preserve original id in html when reset to store in pb

https://gerrit.wikimedia.org/r/1236383

One question regarding editing, if I want to change the ID on the duplicate element, should I modify data-x-id or id (which is Parsoid generated)? And if it's the latter, should I delete data-x-id?

I was initially going to say: change id and strip data-x-id, but then realized that it will lose all the data-parsoid info keyed against the id. And, I verified it here:

╭─subbu@earth ~/work/wmf/parsoid  ‹master*› 
╰─➤  echo '<i id="x">foo</i><i id="x">bar</i>' |php bin/parse.php --pageBundle | sed 's/id="mwAg"/id="x2"/;s/data-x-id="x"//g;' | php bin/parse.php --pageBundle --html2wt
<i id="x">foo</i>''bar''

So, I guess you change the data-x-id instead:

╭─subbu@earth ~/work/wmf/parsoid  ‹master*› 
╰─➤  echo '<i id="x">foo</i><i id="x">bar</i>' |php bin/parse.php --pageBundle | sed 's/data-x-id="x"/data-x-id="x2"/g;' | php bin/parse.php --pageBundle --html2wt
<i id="x">foo</i><i id="x2">bar</i>

@ABreault-WMF sounds right to you?

Change #1242436 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to V0.23.0-a17

https://gerrit.wikimedia.org/r/1242436

Change #1242436 abandoned by Isabelle Hurbain-Palatin:

[mediawiki/vendor@master] Bump wikimedia/parsoid to V0.23.0-a17

Reason:

wrong tag

https://gerrit.wikimedia.org/r/1242436

Change #1242450 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a17

https://gerrit.wikimedia.org/r/1242450

Change #1242450 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a17

https://gerrit.wikimedia.org/r/1242450

Post-deploy I see <table id="mwOA" role="presentation" class="archivebox noprint ombox ombox-notice mbox-small " about="#mwt18" data-x-id="archivebox"> on https://www.mediawiki.org/wiki/User:Vikici?useparsoid=1