Give a warning when a section contains grammatically incorrect language
Open, LowPublic
Actions

Assigned To

None

Authored By

	jeblad
	Jul 1 2016, 3:46 AM

Description

Sometimes the mt-engines turns up complete gibberish, or the editor himself looses track of the text. Would it be possible to detect this? Perhaps something like a statistical engine to verify that the text is somewhat similar to an existing language model, not to do statistical translation but statistical verification of the text.

After checking some translated text it seems like texts with gibberish is either left in the article because the editor gives up, or because he did not notice. A lot of the gibberish is within sections with a lot of messed up templates, so I guess that is an indication that the editor simply gives up. That could also be an indication that we need better tools to revisit failed translations, ie to make it easier to throw out stuff we don't know how to translate.

Related Objects

Mentioned Here: T162525: Flagg suspicious grammatical constructs

Event Timeline

jeblad created this task.Jul 1 2016, 3:46 AM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 1 2016, 3:46 AM

jeblad added a subscriber: Arrbee.Jul 1 2016, 3:49 AM

While I totally understand and acknowledge the problem, it's hard to find something actionable here...

If there was a statistical engine that detects bad grammar, it would be incorporated into the MT engine itself.

Warning about "grammatically incorrect" language is not feasible (unless, for example we integrate LanguageTool, and it's a thing at which I poked a bit, but completing it would be a whole separate project).

That said, we may start thinking about a better way to warn about paragraphs that were filled with machine-translated text and left untouched. It was designed and implemented in 2014, so after two years and almost 100,000 translations, maybe it's time to tweak the design based on the experience.

I'd have to mark this task as invalid, unless you're OK with renaming it to something like "Redesign the warnings for uncorrected machine translation".

There is an university-driven project for grammar checking in Norway, perhaps we could reuse some of their stuff. I need to check it out though, as I have only rudimentary knowledge about it.

Amire80 moved this task from Needs Triage to Bugs on the ContentTranslation board.Jul 20 2016, 10:22 AM

Amire80 added a project: OKR-Work.Jul 22 2016, 1:19 PM

Amire80 triaged this task as Low priority.Jul 22 2016, 8:41 PM

Also some notes on the closed task T162525: Flagg suspicious grammatical constructs

Arrbee moved this task from Bugs to Enhancements on the ContentTranslation board.Jun 22 2018, 1:38 PM

Give a warning when a section contains grammatically incorrect languageOpen, LowPublicActions

Description

Related Objects

Event Timeline

Give a warning when a section contains grammatically incorrect language
Open, LowPublic
Actions