Page MenuHomePhabricator

Edits made via VE on a translatable page removes untouched content in <translate> tags
Open, Needs TriagePublic

Description

Steps to reproduce

You will notice in the "Review your changes" dialog, changes to be published showing content (wrapped in translate tags) that hasn't been touched and will be removed.

It is not the first time that I've experienced this issue. I'm wondering if the editing of translatable pages in VE is not supported, then why we allow switching to the VE mode on these pages?

Related issue(s)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2019, 1:19 AM
matmarex added a comment.EditedOct 1 2019, 1:11 PM

I can reproduce on https://www.mediawiki.org/wiki/Developer_Advocacy, but not on other pages, e.g. https://www.mediawiki.org/wiki/Help:Navigation. Might be specific to that page?

JTannerWMF added a subscriber: JTannerWMF.

We will need to evaluate this task further to determine how to proceed.

@matmarex this happens if you first go into source editing mode and then switch back to visual editing without making any changes.

matmarex updated the task description. (Show Details)Nov 7 2019, 1:42 AM
matmarex added a subscriber: ssastry.

I took a closer look.

The reproduction steps are even simpler – just open the editor at https://www.mediawiki.org/wiki/Developer_Advocacy?veaction=edit. All content of the "About Developer Advocates" section is missing:

Read modeEdit mode

(I also did not notice this the first time…)

The problem occurs because the Parsoid HTML has all the data about the content of the wikitext <translate> tag on a magically generated HTML <section> tag (which is used to wrap sections, as in section editing). As part of VisualEditor's section editing handling, we remove these tags before editing the page, causing the content to be lost forever.

This is an unexpected behavior from Parsoid for me. @ssastry Is this intentional, and if so, how should we handle this?


I said in our triage meeting today that I worry that the same problem (losing content) could also be caused by template markup rather than <translate> tags. While I couldn't find a way to cause content to be lost, I found a way to make a page render differently in mobile VE (where we support single-section editing) and desktop VE (where we don't), which is also worrying:

Desktop: page is composed of multiple transclusionsMobile: page is composed of a single huge transclusion
https://www.mediawiki.org/w/index.php?title=User:Matma_Rex/T234280&veaction=edithttps://m.mediawiki.org/w/index.php?title=User:Matma_Rex/T234280#/editor/all

(I think we should prioritize this, at least until someones assures me that this problem can't happen with just templates)

@matmarex, look at Example 2 and the discussion on https://www.mediawiki.org/wiki/Parsing/Notes/Section_Wrapping#Examples .. The same considerations apply to extension content but in this case, it might be some edge case bug related to extension-section interaction.

cscott added a subscriber: cscott.EditedNov 7 2019, 4:06 PM

Seems like the bug is in VE's section-stripping code? It should probably not strip the section tag iff it has typeof="mw:Transclusion"; that will probably be handled properly by the existing VE "template-affected tag" mechanisms (which just look at typeof and ignore the tag name AFAIK).

I don't expect this to happen with "just templates", as the problem is in the code in VE which is removing Parsoid-generated tags, and that's <section>-specific.

@cscott, but, as I noted in my previous comment, we explicitly provide support for VE and clients stripping section tags without paying attention to the mw:Transclusion type there. But, yes ideally VE should be able to handle section tags without stripping them out.

cscott added a comment.EditedNov 7 2019, 6:05 PM

@ssastry -- the data-mw attribute necessary to round-trip the template is only occurring on the <section> tag. That's true for both the template example in https://www.mediawiki.org/wiki/Parsing/Notes/Section_Wrapping#Examples and the extension example in @matmarex's F31045488.

But in both cases, the solution is the same: VE shouldn't strip <section> if the typeof is transclusion or extension. Then VE *ought* to handle the embedded content fine, since the transclusion/extension handling in VE doesn't care about tag name AFAIK.

No, we update data-mw appropriately to be valid for both locations.

[subbu@earth:~/work/wmf/parsoid] node bin/parse.js --wrapSections --normalize=parsoid < /tmp/a 
<section data-mw-section-id="0"></section><section data-mw-section-id="1" typeof="mw:Transclusion" data-mw='{"parts":["=1=\nb\n",{"template":{"target":{"wt":"1x","href":"./Template:1x"},"params":{"1":{"wt":"==1.1==\nc\n=2=\nd"}},"i":0}},"\n==2.1==\ne\n"]}'>
<h1 id="1">1</h1>
<p>b</p>
<section data-mw-section-id="-1">
<h2 typeof="mw:Transclusion" id="1.1" data-mw='{"parts":[{"template":{"target":{"wt":"1x","href":"./Template:1x"},"params":{"1":{"wt":"==1.1==\nc\n=2=\nd"}},"i":0}}]}'>1.1</h2>
<span> </span>
<p>c</p>
<span> </span></section></section><section data-mw-section-id="-1">
<h1 id="2">2</h1>
<span> </span>
<p>d</p>
<section data-mw-section-id="4">
<h2 id="2.1">2.1</h2>
<p>e</p>
</section></section>
cscott added a comment.Nov 7 2019, 6:44 PM

You're right, I didn't notice it in the example, and i missed it in @matmarex's screenshot, because data-mw was abbreviated in both cases. But it's there, on the <h2> tag in your example, and on the top <h2> in F31045488.

Ok, so what's going wrong then? We're back to square 1.

You're right, I didn't notice it in the example, and i missed it in @matmarex's screenshot, because data-mw was abbreviated in both cases. But it's there, on the <h2> tag in your example, and on the top <h2> in F31045488.
Ok, so what's going wrong then? We're back to square 1.

I think it is some edge case bug (might be specific to translate even). I say that without looking at @matmarex's analysis, but I have no reason to believe it is not an edge case bug. I wanted to chime in early to reassure him that this is not a wider bug that affects all template - section interactions because we handle that already.

Thanks, that is reassuring :) That's clever, I didn't know it worked that way.

I think I see the exact problem here. In the Parsoid HTML (https://www.mediawiki.org/api/rest_v1/page/html/Developer_Advocacy), there are two elements with about="#mwt67" and different data-mw:

<h2 typeof="mw:Extension/translate" about="#mwt67" id="Areas[edit]" data-mw="{&quot;name&quot;:&quot;translate&quot;,&quot;attrs&quot;:{},&quot;body&quot;:{&quot;extsrc&quot;:&quot;\n== Areas == <!--T:5-->\n\n=== About Developer Advocates === <!--T:6-->\n&quot;},&quot;extSuffix&quot;:&quot;&quot;}"><span id="Areas.5Bedit.5D" typeof="mw:FallbackId"></span><span class="mw-headline" id="Areas">Areas</span><span class="mw-editsection" id="mwDA"><span class="mw-editsection-bracket" id="mwDQ">[</span><a href="/w/index.php?title=Developer_Advocacy&amp;action=edit&amp;section=1" title="Edit section: Areas" id="mwDg">edit</a><span class="mw-editsection-bracket" id="mwDw">]</span></span></h2>
<span typeof="mw:Extension/translate" about="#mwt67" data-mw="{&quot;name&quot;:&quot;translate&quot;,&quot;attrs&quot;:{},&quot;body&quot;:{&quot;extsrc&quot;:&quot;\n<!--T:62-->\nDeveloper Advocates support and grow the existing MediaWiki and Wikimedia FLOSS developer communities by coordinating participation in Hackathons and [[Outreach programs|outreach events]] like Google Summer of Code, Outreachy, and Google Code-in.\n\n<!--T:63-->\nDeveloper Advocates are technical contributors with strong communications skills, who focus their efforts on creating tutorials, sample code, blog posts, conference presentations, and other outreach to make using their organization's software platform easier.\n\n<!--T:64-->\nDeveloper Advocates also collect feedback from the developer communities they serve.\n\n=== About Technical Writers === <!--T:59-->\n&quot;}}" id="mwGw">Developer Advocates support and grow the existing MediaWiki and Wikimedia FLOSS developer communities by coordinating participation in Hackathons and </span>

They are in different sections, so while weird, this doesn't immediately break things. But after unwrapping the sections (ve.unwrapParsoidSections(document.body, undefined)), these two elements become siblings, so the two about-groups get merged into one, and the data-mw from the second element is lost.

Is this a bug in Parsoid?

If this is expected, then we should probably change about-group handling in VE so that it stops gathering elements when it encounters a second one with data-mw and the same about.

I think I see the exact problem here. In the Parsoid HTML (https://www.mediawiki.org/api/rest_v1/page/html/Developer_Advocacy), there are two elements with about="#mwt67" and different data-mw:
...
They are in different sections, so while weird, this doesn't immediately break things. But after unwrapping the sections (ve.unwrapParsoidSections(document.body, undefined)), these two elements become siblings, so the two about-groups get merged into one, and the data-mw from the second element is lost.
Is this a bug in Parsoid?

Very likely a bug in Parsoid. This shouldn't happen.

I think this is actually a bug in Parsoid. The second element should have about="#mwt39" – there's an element later with about="#mwt39", but none with data-mw.