Page MenuHomePhabricator

Enable Machine Translation in English in the content translation tool
Open, LowPublic

Description

It is possible to rapidly bring automated translation up to good standard, especially if one is fluent in the target language, and slightly familiar with the source language.
Currently English is disabled as a target language on the content translation tool. The project would benefit from having this enabled.

Runa said:

Machine translations were '''not''' supposed to be enabled in the English Wikipedia, but it got enabled in error during a configuration change, and now it's disabled.

Event Timeline

KartikMistry renamed this task from Enable English in the content translation tool to Enable Machine Translation in English in the content translation tool.Jun 26 2016, 2:40 PM

Anyway, as I said (with a typo correction): I don't think any major translator service doesn't start with a machine translation and then improve it by comparison with the original language's text. This machine translation is already available, and offered to other languages, help those of us out who translate into English: Turn it on; if people don't want it, they don't have to use it.

Amire80 triaged this task as Medium priority.Jul 22 2016, 8:42 PM

enwiki is currently discussing challenges related to machine translations here: https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard&oldid=731562551#Content_translator_tool_creating_nonsense_pages

WMF staff is reviewing for feedback. I suggest putting this on hold pending community consensus at the target language.

WMF staff is reviewing for feedback. I suggest putting this on hold pending community consensus at the target language.

It is on hold. We don't have any immediate timeline for this, but we are putting together all the requests we are receiving to keep them on record. Thanks.

@Arrbee: The cited discussion ended a year ago and resulted in an edit filter preventing people who are not extended confirmed from using the Content Translation Tool. At this point, there doesn't seem to be any reason to not enable machine translation into English. Those of us who use the tool would certainly benefit from this capability being enabled, and it is very unlikely that it will result in poorly translated articles since the tool is restricted to extended confirmed users.

Xaosflux changed the task status from Open to Stalled.Apr 16 2017, 11:36 PM
Xaosflux lowered the priority of this task from Medium to Low.

@kaldari - the existence of the enwiki abusefilter is preventing all CXT by newer users to the English Wikipedia - however it does not represent a community consensus that machine translation is supported otherwise - please cite an enwiki discussion supporting enabling machine translation - the only thing I am seeing in https://en.wikipedia.org/wiki/Wikipedia:Translation#Avoid_machine_translations - specifically saying to avoid machine translations for all editors.

@Xaosflux: The content translation tool doesn't automatically machine translate articles. It provides machine translation of specific chunks of text on demand for human beings who are translating articles. Wikipedia:Translation#Avoid_machine_translations is about using "unedited machine translation", which is definitely a bad idea. I think we have a lot more to worry about from newbies using Google Translate than we do from extended confirmed users using the content translation tool. Do you really think we need to reopen the discussions on this?

One recurring issue has been trouble with the translation tool interface - it is hard to add community controlled messages in to the page - adding a MediaWiki transclusion to the page where community expectations could be spelled out may be sufficient. Any devs of this tool have options for us?

There is a enwiki RFC open that may resolve the community consensus question here: https://en.wikipedia.org/wiki/Wikipedia_talk:Translation#RFC

I don't think that RFC (or Translation#Avoid_machine_translations) directly address this task. I support prohibiting machine translated articles on English Wikipedia, but I also support enabling machine translation in the Content Translation Tool as an aide to manual translation. If I vote to keep the prohibition, will that be interpreted as a vote against resolving this task?

I reject the premise of the opening sentence of this task as false, which means that the proposal given in the second sentence is completely moot:

It is possible to rapidly bring automated translation up to good standard, especially if one is fluent in the target language, and slightly familiar with the source language. [emphasis added]

In fact, it is not possible to bring automated translation up to good standard, rapidly or otherwise, unless one has a substantial familiarity with the source language.

The problem lies in the expression "good standard" which conflates "good English" with "good translation" and is a typical (and natural) misunderstanding of monolinguals.

If by "good standard" you mean "a good standard of English" (fully grammatical, correct, and well-styled English, as a typical native speaker would generate), then yes, I would agree with you. However, that beautiful English may hide glaring inaccuracies of fact resulting from poor translation of the original.

If by "good standard" you mean, "an accurate translation which does not misrepresent the facts of the source" then I completely disagree with you. Automatic translation regularly produces much better English than in the past, but errors of fact are still there, and the better the quality of English, the more these errors are invisible to monolinguals under the pretty, whitewashed façade.

In a way, a highly-sourced article with many references exacerbates the problem, because who is going to dare to challenge an article in beautiful English prose with dozens or hundreds of references in German, Croatian and Hungarian every sentence or two? Don't they "prove" the accuracy of the (English) article?

The fact is, there is simply no way to claim a translated article has a "good standard" if you have only a passing familiarity with the source language.

Automatic translation has been around for over 50 years and is getting better fast, but it's not there yet. I use CXT and I find no utility for having machine translation available through the tool, as MT is one click and one tab away if I want it.

Enabling CXT through the tool primarily achieves one thing: it makes it easier for well-meaning monolinguals to rapidly create junk and degrade the quality of the encyclopedia. (As a side effect, it will likely annoy the inadequate number of qualified translators we actually have now who may despair of having dung heaps slung at them at script speed and give up, thus compounding the problem.)

@Mathglot: Your example is an editor that used Google Translate, so I don't think that's very convincing. Editors have access to machine translation regardless of this feature. A lot of us use it to create good quality articles (with new citations) that no one has ever complained about. Some people use it to create crappy articles, which are probably the ones you're more likely to notice. If editors are creating crappy articles, it's not the fault of Google Translate or CXT, it's because some editors don't care about the standards of the project and they're going to cause problems one way or another. At least with the feature limited to extended confirmed users, you are going to have a much higher chance of people using the software in a way that is beneficial (which I know you don't believe is even possible, but I'll just have to disagree with you on that). Why penalize those of us who want to use the feature in a way that is productive just to thwart a handful of bad editors? That seems counter to the wiki ethos.

Anyway, this is probably the wrong venue for this discussion. I guess we'll just have to have yet another RfC on the issue.

Your example is an editor that used Google Translate, so I don't think that's very convincing.

I count 172 uses of ContentTranslation by that user; do you want me to enumerate them?

Editors have access to machine translation regardless of this feature.

Yep. And that's a great argument for why we don't need the feature embedded.

A lot of us use it to create good quality articles (with new citations) that no one has ever complained about.

Agreed. I do, myself. Maybe the reason no one has complained about your use of ContentTranslation, is that your first use of the tool was yesterday, turning a long Spanish article into a two sentence stub. So your complaints about productivity loss seem a bit theoretical to me.

Some people use it to create crappy articles, which are probably the ones you're more likely to notice.

Yes, and that is part of the problem. As MT gets better and better, you're less likely to notice, or if they follow up their MT with copyediting for proper English, same result. And by "crappy articles" do you mean, "crappy English" or "crappy/counter to fact" because that is a huge part of the problem.

If editors are creating crappy articles, it's not the fault of Google Translate or CXT, it's because some editors don't care about the standards of the project and they're going to cause problems one way or another.

I disagree. You're making a prediction based on no data, and you're guessing about "some editors'" motivation and failing to assume good faith. The problems I'm seeing with crappy articles come from actual data--the crappy articles--and they come from people using the tool, with MT enabled. And from what little interaction I've had with them, these are motivated users sincerely wanting to improve the encyclopedia.

At least with the feature limited to extended confirmed users, you are going to have a much higher chance of people using the software in a way that is beneficial

That is an extremely low bar, and you perhaps misunderstand the desire and motivation of well-meaning editors with years of experience and thousands of edits to "help" the project.

... of people using the software in a way that is beneficial (which I know you don't believe is even possible, but I'll just have to disagree with you on that)

On the contrary, I agree with you that using MT software in a way that is beneficial is not only possible in the right hands, it speeds up the task enormously. That's why I regularly use it myself. We only disagree into which hands it should be placed, and "extended, auto-confirmed users" is a complete joke.

Why penalize those of us who want to use the feature in a way that is productive just to thwart a handful of bad editors?

Now, we come to the crux of the matter (finally!). Yes, I agree, it does penalize "those of use who want to use the feature [in a productive way]". Surprised that I agree? It makes me change tabs and do one extra click, every time I want to do this, so it slows me down, very slightly. As to your question why we should penalize "those of us", including myself, the answer is, to stop editors from destroying the quality of the encyclopedia. (And it's more than a little irritating that you keep saying "those of us" when that group does not include you.)

Correcting poor translations can only be done at human speed, it is painstaking, and takes careful work. Adding poor translations can be done at script speed, and it doesn't matter if it's a "handful of editors" or not--I could write a bot to flood en-wiki with hundreds of thousands of such articles. It's quite difficult even just to get an agreement to delete such articles that have already been created this way, and now you want to reenable this again?

We passed the 5M article level, recently, that's terrific. There are rumblings about editor retention being a problem, about deteriorating quality of the project, and increasing activity by paid editors and other non-NPOV parties. You think Wikipedia is guaranteed to survive, just because it's huge? Where are Napster and Myspace now?

Let me turn your plaint about those use the tool properly around, and ask you this:

Are you totally opposed to context-switching out of ContentTranslation in order to use MT in a different window and cut/paste back into CXT, even if that would help thwart the rapid creation of junk by well-meaning editors? Or, couldn't you just suck it up and do that for the sake of the project?

Anyway, this is probably the wrong venue for this discussion. I guess we'll just have to have yet another RfC on the issue.

Yeah, probably. Why don't you come to the next WIkiSalon and let's discuss it in person. They're open to presenters giving talks, maybe you and I could present jointly. Should make for an interesting discussion.

And it's more than a little irritating that you keep saying "those of us" when that group does not include you.

Apart from you turning this discussion into personal attacks, this doesn't even make sense. The context of the sentence is "... those of us who want to use the feature in a way that is productive". I can certainly want to use the feature productively regardless of how much I've already used it.

Are you totally opposed to context-switching out of ContentTranslation in order to use MT in a different window and cut/paste back into CXT

I'm not totally opposed to it, but it's enough of a hassle to keep me from using CXT. I'd rather just paste the whole article into Google Translate, paste it back into the regular WikiText editor and go from there. For me, there's no point in using CXT without this feature. But maybe from your point of view, that's a good outcome.

Why don't you come to the next WIkiSalon and let's discuss it in person.

That sounds good to me.

On Polish Wikipedia we implemented an AbuseFilter entry that prevents non-autoconfirmed users from publishing to the main space. Which means they are free to publish to their userspace drafts if they please. And it works wonders. Just saying.

Aklapper changed the task status from Stalled to Open.Nov 2 2020, 5:37 PM
Aklapper removed a subscriber: Halibutt.

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status, as tasks should not be stalled (and then potentially forgotten) for years for unclear reasons.

(Smallprint, as general orientation for task management:
If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead.
If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks.
If this task is stalled on an upstream project, then the Upstream tag should be added.
If this task requires info from the task reporter, then there should be instructions which info is needed.
If this task needs retesting, then the TestMe tag should be added.
If this task is out of scope and nobody should ever work on this, or nobody else managed to reproduce the situation described here, then it should have the "Declined" status.
If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)