Page MenuHomePhabricator

CX2: Content Translation creates articles that have tags with cx-segment
Closed, ResolvedPublic

Description

In some cases, published content with Content Translation leaks cx-segment metadata elements into the published content. For example, in this article published with version 2 of Content Translation (on Feb 5 2019):

<ref> <span data-segmentid="129" class="cx-segment"><span title="A volte può capitare che un link presente su Wikipedia non sia più raggiungibile...

This abuse filter can be useful to get example and check if the issue persists over time.

Examples:

These were created long after the similar issue T113137 was resolved, so it's not a caching issue. I am marking this as Regression, even though the thing that caused it in this article might be different from what was addressed in T113137.

Event Timeline

Amire80 created this task.Aug 29 2016, 11:06 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2016, 11:06 AM
Amire80 moved this task from Needs Triage to Bugs on the ContentTranslation board.Sep 7 2016, 8:24 AM
Amire80 updated the task description. (Show Details)Sep 9 2016, 8:10 AM
Amire80 updated the task description. (Show Details)Sep 10 2016, 9:09 AM
Amire80 added a subscriber: Elinruby.
Amire80 updated the task description. (Show Details)Oct 14 2016, 11:54 AM
Amire80 triaged this task as High priority.May 9 2017, 5:44 PM
Amire80 moved this task from Bugs to Out of Beta on the ContentTranslation board.
Base added a subscriber: Base.Dec 15 2018, 10:43 PM
Base added a comment.EditedDec 15 2018, 10:55 PM

I've created https://meta.wikimedia.org/wiki/Special:AbuseFilter/192 to mark new pages adding "cx-segment".

Trizek added a subscriber: Trizek.EditedJan 23 2019, 10:36 AM

It is normal to have the phab ID in the diffs for edits made with that tag? https://fr.wikipedia.org/w/index.php?title=Ex-Mattatoio_de_Rome&oldid=156081838

It is normal to have the phab ID in the diffs for edits made with that tag? https://fr.wikipedia.org/w/index.php?title=Ex-Mattatoio_de_Rome&oldid=156081838

That is not added by us. See Base's Abuse Filter above. It's a global filter so it is in use in all projects.

Thanks.

@Base, can you change the wording please? Have that Phab task is a bit disturbing. :)

Pginer-WMF renamed this task from Content Translation creates articles that have tags with cx-segment to CX2: Content Translation creates articles that have tags with cx-segment.Feb 13 2019, 10:47 AM
Pginer-WMF updated the task description. (Show Details)

Based on the examples from the abuse filter it seems that these cases are less frequent with the new version (CX2), but they are still happening. I added one of such examples to the description to help investigate further.

These two articles does not look like created using CX. They are created using VisualEditor:

https://cs.wikipedia.org/w/index.php?title=Matchmaking_(video_hry)&action=history
https://cs.wikipedia.org/w/index.php?title=Metroidvania&action=history

But it is strange to see the content has cx attributes. Wondering whether the user copied html from CX and pasted in VE in another browser tab?

Yeah, this possibility was discussed in the merged duplicate task. Perhaps CX should remove these tags when copied? Or produce some no-tag result? Or VE should perhaps remove these tags on paste?

Restricted Application added a project: VisualEditor. · View Herald TranscriptApr 1 2019, 8:24 PM

Yeah, this possibility was discussed in the merged duplicate task. Perhaps CX should remove these tags when copied? Or produce some no-tag result? Or VE should perhaps remove these tags on paste?

We should explore whether it is possible to clean the metadata when the user copies content from Content translation. We need to make sure that removing the metadata does not break pasting the contents in Content translation itself. Otherwise it would be problematic when moving contents around in the tool. Based on that we can propose for Visual Editor to support cleaning up the pasted contents.
I captured this case in a separate ticket: T220495: Content copied from Content Translation into Visual Editor exposes internal attributes

JTannerWMF added a subscriber: Arrbee.

Hey @Pginer-WMF @Arrbee , will the language team take this task on?

Hey @Pginer-WMF @Arrbee , will the language team take this task on?

Yes. The scope for this one is for content created with Content translation. So maybe the VisualEditor-related tags can be removed.

For the particular case of content copied from Content translation and pasted into Visual Editor, I created a separate ticket (T220495). Even for that one, the plan is for the Language team to evaluate first if this can be solved from Content translation first.

Change 520173 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not segment content inside block templates

https://gerrit.wikimedia.org/r/520173

Thanks. I went through the list. Articles created using earlier version of CX(CX1) in 2015-2018 are there in the list - Though they are valid issues, CX1 code base is now not used for starting new articles. I am looking for these kind of issues if any created using CX2.

Change 520173 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not segment content inside block templates

https://gerrit.wikimedia.org/r/520173

Jpita closed this task as Resolved.Jul 16 2019, 12:05 PM