Page MenuHomePhabricator

CX: <div class="cx-overlay"><div class="cx-spinner"></div></div>
Open, MediumPublic

Description

Recently, I've seen several articles in frwiki created with CX with this error in external links : <div class="cx-overlay"><div class="cx-spinner"></div></div>

Cf. for example https://fr.wikipedia.org/w/index.php?title=A%C3%A9roport_de_Barra_do_Gar%C3%A7as&action=edit&oldid=147655082
With many other problems :

  • <span tabindex="0" id="cxmwAg" contenteditable="false" data-source="mwAg"></span> at the beginning
  • Useless span tags around the end of the article

When will CX stop using production Wikipedia has a playground ?

Event Timeline

And the one I forgot to mention, but that is very frequent and means a lot of work: infoboxes replaced with buggy HTML code with links to modify the article inserted in the pseudo infobox code:

We are working on a new version of Content Translation that will provide a more solid editing surface, which is expected to solve these issues.

Pginer-WMF triaged this task as Medium priority.Apr 25 2018, 3:54 PM

Hello @Pginer-WMF and thanks for your reply. But this problem seems to me to deserve more than a generic answer.

As @NicoV said, the problem seems pretty recent. This means that a software change has revealed the problem. I find it very unfortunate that the only answer is to wait for the next version of the tool, which if I have understood correctly will not be deployed in the coming weeks or perhaps months.
Meanwhile CX starts again completely useless tags in published articles. Similarly I suppose deploying v2 will not remove tags from articles already published? Unless you're planning on using a bot to remove these tags everywhere? It would be nice to have more information on how you want to deal with the problem.

Hello @Pginer-WMF and thanks for your reply. But this problem seems to me to deserve more than a generic answer.

I can try to provide more details. Content Translation version 1 uses the browser default editing surface, which is not as solid as we would like it to be. As a result, some of the internal styling used by the tool may leak into the generated content under some circumstances. Since there are many factors into play (browser discrepancies, complexity of different kinds of content, etc.) making a custom fix for each incident does not seem a scalable solution. We tried this route in the past, and new issues kept appearing. Instead, we plan to replace the whole editing surface by a more solid one.

As @NicoV said, the problem seems pretty recent. This means that a software change has revealed the problem. I find it very unfortunate that the only answer is to wait for the next version of the tool, which if I have understood correctly will not be deployed in the coming weeks or perhaps months.

This is a big change and it will take some time, but I don't think our approach can be described as ignoring these issues, when solving them is one of the main goals. Version 2 was started with the specific goal to make the tool more solid and solve issues like the one described in this ticket. By working on version 2, we are working on a solution for this (and similar issues) right now as our main focus. In this case, we are using Visual Editor editing surface since it already has many considerations to deal with wikitext. We believe that given one day of development time, it is better to put it in version 2 and complete it earlier rather than patching version 1.

Meanwhile CX starts again completely useless tags in published articles. Similarly I suppose deploying v2 will not remove tags from articles already published? Unless you're planning on using a bot to remove these tags everywhere? It would be nice to have more information on how you want to deal with the problem.

Content Translation helps to produce articles that are less likely to be deleted than those produced from scratch by reusing contents form other languages. It is unfortunate that the produced markup is not always clean and this generates work for the community to review. At the moment our focus is on having the new version ready as soon as possible so that the new articles translated have their markup as clean as possible. In any case, we are open to explore ways in which we can facilitate the process of cleaning up articles created with version 1.

@Pginer-WMF
Ways to facilitate the process of cleaning up articles until a stable version is out (as already requested 2 1/2 years ago...) :

  • Stop advertising CX to users who are simply editing (last month, I think I got 5 CX popups inviting me to translate articles, which I refuse every time, but it keeps coming up on a regular basis) : why trying to invite everyone when you know it produces a lot of problems in articles ?
  • Clearly tell people that CX is beta software and that they have to check what they have produced (including the wikitext) if they want it to be in encyclopedic namespace. And that they should only translate into a language they really know (it has cost me a lot of time to explain to people that had a basic understanding in French that they should stop translating articles into French because they thought CX was doing the job for them : some of them ended up indefinitely blocked...)
  • Force people to translate in a draft, not in the encyclopedic namespace

@Pginer-WMF
Ways to facilitate the process of cleaning up articles until a stable version is out (as already requested 2 1/2 years ago...) :

I can share some thoughts on what we have considered for those:

  • Stop advertising CX to users who are simply editing (last month, I think I got 5 CX popups inviting me to translate articles, which I refuse every time, but it keeps coming up on a regular basis) : why trying to invite everyone when you know it produces a lot of problems in articles ?

We are inviting people that are in the process of creating an article. We are providing an alternative to users which are already trying to create a new article from scratch. Creating a new article from scratch often gets into the content being deleted which is not an ideal experience for users, providing these users an alternative such as translation that is more likely to create a better contribution seems a good decision.

We have measured the deletion ratios for different wikis, in the case of French Wikipedia 27.03% of new articles not created as a translation are deleted, while only 5.37% of articles created with Content Translation are deleted. Not providing the alternative of translating, means to encourage a more problematic path. This is why we think that even with the current limitations, Content Translation is still a good alternative for those cases where translating an article is possible.

  • Clearly tell people that CX is beta software and that they have to check what they have produced (including the wikitext) if they want it to be in encyclopedic namespace. And that they should only translate into a language they really know (it has cost me a lot of time to explain to people that had a basic understanding in French that they should stop translating articles into French because they thought CX was doing the job for them : some of them ended up indefinitely blocked...)

I think we present the tool as beta in the process of enabling it, but I'll review that. Regarding conveying the need to review the content we already approach it in several ways:

The initial instructions indicate that the texts need to be reviewed, and link to the translation guidelines:

Screen Shot 2018-05-11 at 10.55.45.png (950×1 px, 112 KB)

When users add machine translation without editing it much, a warning is also shown:

Screen Shot 2018-05-11 at 10.56.57.png (715×1 px, 169 KB)

  • Force people to translate in a draft, not in the encyclopedic namespace

This was the initial configuration used when the tool first launched in the Catalan wikipedia as a pilot test, and it caused more problems due to the complexities of moving pages than it helped. Considering that creating a page from scratch in the main namespace is really easy, I don't se the need to make translation to be more complex when it produces articles that are less likely to be deleted in comparison.

The first problem with your reasoning is the deletion ratio.
It's obvious that the deletion ratio should be much lower with CX than with a direct creation, but because the main reason for deleting a new article is that it's not admissible (people creating an article about their company, their band...). If the article already exists in an other language, it has a lot better chance of being admissible, and hence of not being deleted. But it has nothing to do with the fact that it's a translation with a tool like CX... So, using the deletion ratio as a justification is simply taking credit for something that has nothing to do with the tool. The articles that are not admissible are often not candidates for translation. Your conclusions with the difference on the deletion ration are biased towards your tool because you don't take into account the difference in the subjects of the articles...
On the other hand, the work effort to fix all the garbage wikitext that CX is producing requires a lot of efforts from volunteers like me...

The second problem is the advertising : why do you absolutely want to repeatedly advertise a tool that is producing so much problems. Can't you wait until it is capable of producing correct wikitext ? As I said, I had to say "No" several times just lasth month (and I even disabled CX in the preferences to no effect : why keep asking users when they already said no (several times...) ?

For me, you're confusing a beta testing time with a production time : yes, it's more difficult to create a draft and move it in the encyclopedic namespace afterwards, but while the tool is in beta, I don't see anything wrong with people wanting to try it to have to do a bit of extra work. The goal of a beta testing is not to have as many users as possible, but to see what are the problems with the tools and what can be done for enhancing it. Most of the problems have been reported 2 or 3 years ago, and they are still not fixed, and they apparently won't be fixed soon.