Page MenuHomePhabricator

Give a warning when a section contains grammatically incorrect language
Open, LowPublic

Description

Sometimes the mt-engines turns up complete gibberish, or the editor himself looses track of the text. Would it be possible to detect this? Perhaps something like a statistical engine to verify that the text is somewhat similar to an existing language model, not to do statistical translation but statistical verification of the text.

After checking some translated text it seems like texts with gibberish is either left in the article because the editor gives up, or because he did not notice. A lot of the gibberish is within sections with a lot of messed up templates, so I guess that is an indication that the editor simply gives up. That could also be an indication that we need better tools to revisit failed translations, ie to make it easier to throw out stuff we don't know how to translate.

Event Timeline

While I totally understand and acknowledge the problem, it's hard to find something actionable here...

If there was a statistical engine that detects bad grammar, it would be incorporated into the MT engine itself.

Warning about "grammatically incorrect" language is not feasible (unless, for example we integrate LanguageTool, and it's a thing at which I poked a bit, but completing it would be a whole separate project).

That said, we may start thinking about a better way to warn about paragraphs that were filled with machine-translated text and left untouched. It was designed and implemented in 2014, so after two years and almost 100,000 translations, maybe it's time to tweak the design based on the experience.

I'd have to mark this task as invalid, unless you're OK with renaming it to something like "Redesign the warnings for uncorrected machine translation".

There is an university-driven project for grammar checking in Norway, perhaps we could reuse some of their stuff. I need to check it out though, as I have only rudimentary knowledge about it.