Maniphest T220495

Content copied from Content Translation into Visual Editor exposes internal attributes
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Pginer-WMF
	Apr 9 2019, 11:53 AM

Description

Based on T144167#5075490 it seems some users copy content from Content translation to paste it into Visual Editor. As a result unnecessary attributes leak into the final result. Based on this example you can see that unnecessary HTML markup was removed such as the following:

<span data-segmentid="9" class="cx-segment">...<span>

This task is intended to:

Explore if there is a way for Content translation to reliably clean up contents when they are copied.
- Check that any clean-up approach does not cause issues when pasting the contents in Content translation itself because of the lost metadata. This would limit the ability users have to move content around.
- Check that the solution works when both using the copy&paste clipboard and drag&drop.
If there is no reliable solution from Content translation side, explore how to clean-up the contents when pasted into Visual Editor. Similar approaches may be in place for pasting content from other tools such as Microsoft Office.

Users may be doing this as a shortcut to expand existing articles with a translation of some new content, but that's just a guess. We don't know how often this behaviour is.

Details

	Subject	Repo	Branch	Lines +/-
	Re-apply "Don't generate HTML for segments when copying"	mediawiki/extensions/ContentTranslation	master	+22 -10
	Don't generate HTML for segments when copying	mediawiki/extensions/ContentTranslation	master	+13 -6

Customize query in gerrit

Related Objects

Mentioned In: T229906: Sentence pair highlighting broken
T111000: CX creates span tags with cx-highlight class
T144167: CX2: Content Translation creates articles that have tags with cx-segment
Mentioned Here: T228498: Monospace template causes the rest of the paragraph to be ignored
T229906: Sentence pair highlighting broken
T144167: CX2: Content Translation creates articles that have tags with cx-segment

Event Timeline

Pginer-WMF created this task.Apr 9 2019, 11:53 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 9 2019, 11:53 AM

Pginer-WMF triaged this task as Medium priority.Apr 9 2019, 11:54 AM

Pginer-WMF mentioned this in T144167: CX2: Content Translation creates articles that have tags with cx-segment.

Pginer-WMF added subscribers: Esanders, • santhosh.

Pginer-WMF updated the task description. (Show Details)Apr 9 2019, 11:59 AM

Dvorapa subscribed.Apr 9 2019, 5:30 PM

Pginer-WMF moved this task from Needs Triage to Bugs on the ContentTranslation board.Apr 22 2019, 1:49 PM

• santhosh mentioned this in T111000: CX creates span tags with cx-highlight class.Apr 26 2019, 5:07 AM

Pginer-WMF edited projects, added Language-Team (Language-2019-July-September); removed Language-Team (Language-2019-April-June).Jul 9 2019, 1:47 PM

Change 521521 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/ContentTranslation@master] Don't generate HTML for segments when copying

https://gerrit.wikimedia.org/r/521521

gerritbot added a project: Patch-For-Review.Jul 9 2019, 3:36 PM

The content can be cleaned up when copying and the converter has separate modes when generating HTML for Parsoid/Clipboard.

Internal copy/paste and drag/drop don't use clipboard HTML so won't be affected.

• santhosh assigned this task to Esanders.Jul 10 2019, 5:08 AM

• santhosh moved this task from Backlog to Needs QA on the Language-Team (Language-2019-July-September) board.

Change 521521 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Don't generate HTML for segments when copying

https://gerrit.wikimedia.org/r/521521

ReleaseTaggerBot added a project: MW-1.34-notes (1.34.0-wmf.14; 2019-07-16).Jul 10 2019, 6:00 AM

Maintenance_bot removed a project: Patch-For-Review.Jul 10 2019, 6:10 AM

@Esanders @santhosh can you give me an example of where those tags are used so I can see if they're still being passed correctly?

• Jpita moved this task from Needs QA to Recheck after deployment on the Language-Team (Language-2019-July-September) board.Jul 18 2019, 2:17 PM

@Jpita Every paragraph in a Content translation target document has them (they are what make the sentences appear yellow when you hover on them). So just copy anything out of a CX translation, and paste it into a normal VE instance to test this.

@Esanders thanks!

• Jpita moved this task from Recheck after deployment to Done on the Language-Team (Language-2019-July-September) board.Jul 23 2019, 12:49 PM

Pginer-WMF closed this task as Resolved.Jul 29 2019, 1:03 PM

• santhosh mentioned this in T229906: Sentence pair highlighting broken.Aug 6 2019, 9:12 AM

I had to revert this fix because of T229906: Sentence pair highlighting broken
Revert patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/528406

We use the clipboard mode for extracting HTML for MT while adding a section to target language. Clipboard mode allows to localize the reference content, so it is important to use that mode. Since sentence annotation need to be present in translated content for highlighting pairs, now the highlighting is broken. More than highlighting, since some of MT annotation mapping for plain text MT services depend on sentences, they also misbehaving(T228498#5395412).

@Esanders If your fix can be done only for the target language, it would be good. But since ve.dm.CXSentenceSegmentAnnotation.static.toDomElements has no informaiton on the language, that is not easy. What do you suggest?

• Petar.petkovic moved this task from Done to In Progress on the Language-Team (Language-2019-July-September) board.Aug 6 2019, 10:15 AM

In T220495#5395580, @santhosh wrote:

@Esanders If your fix can be done only for the target language, it would be good. But since ve.dm.CXSentenceSegmentAnnotation.static.toDomElements has no informaiton on the language, that is not easy. What do you suggest?

If we just did it based on the document language you could still write segments to the clipboard by copying from the source document.

Thinking about what the converter modes mean, I think clipboard mode might still be the correct mode to use, as essentially it means "for export to another VE instance, via some serialised storage".

I think what we should do is pass an additional flag to the converter saying isForTranslation, and then check for this in ve.dm.CXSentenceSegmentAnnotation.static.toDomElements

Strictly speaking we should extend the converter and create a new mode to do this, but we can just hack it for now:

ve.dm.converter.isForTranslation = true;
html = ve.dm.converter.getDomFromNode( ... );
ve.dm.converter.isForTranslation = false;

Change 528471 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/ContentTranslation@master] Re-apply "Don't generate HTML for segments when copying"

https://gerrit.wikimedia.org/r/528471

gerritbot added a project: Patch-For-Review.Aug 6 2019, 2:01 PM

Thanks. That works

Change 528471 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Re-apply "Don't generate HTML for segments when copying"

https://gerrit.wikimedia.org/r/528471

Maintenance_bot removed a project: Patch-For-Review.Aug 7 2019, 5:10 AM

• Petar.petkovic moved this task from In Progress to Needs QA on the Language-Team (Language-2019-July-September) board.Aug 7 2019, 7:35 AM

I'd recommend adding some unit tests that assert that segments are preserved when translating.

• Jpita closed this task as Resolved.Aug 12 2019, 4:59 PM

• Jpita moved this task from Needs QA to Done on the Language-Team (Language-2019-July-September) board.

ReleaseTaggerBot edited projects, added MW-1.34-notes (1.34.0-wmf.19; 2019-08-20); removed MW-1.34-notes (1.34.0-wmf.14; 2019-07-16).Aug 20 2019, 12:01 PM

Content copied from Content Translation into Visual Editor exposes internal attributesClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Content copied from Content Translation into Visual Editor exposes internal attributes
Closed, ResolvedPublic
Actions