Page MenuHomePhabricator

CX2: Extra spaces are added when using Google Translate
Closed, ResolvedPublic

Description

When I'm translating using Google from English to Hebrew, I frequently see unnecessary extra spaces. This is especially common between sentences: "last word. First word" becomes "last word. First word" (note the two spaces). Also, a single space is often inserted before a comma: "word, word" becomes "word, word".

This may well be an upstream problem, but it would be nice to verify that these spaces are not added by CX or VE, or perhaps to find a way to clean them up,

Here is the same paragraph in https://translate.google.com. As you can see, the are no extra spaces here:

To test it, I published a draft. In this revision you can see the following text: "וליו"ר ב -1950. ב -1952". Note the two spaces after the full stop. I'd expect one space.


A related bug was reported on Google's side.

Event Timeline

Amire80 created this task.Jan 14 2019, 12:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 14 2019, 12:14 PM
Amire80 updated the task description. (Show Details)Jan 14 2019, 12:41 PM
Amire80 updated the task description. (Show Details)Jan 14 2019, 2:03 PM

Checked it in FF and Chrome (the language was changed to Hebrew for the site) - there were no extra spaces displayed:

What am I missing?

Pginer-WMF added a subscriber: Pginer-WMF.

Based on the example from this conversation, the problem of extra spaces added by Google Translate is also happening for the translation of "Oktay Rıfat Horozcu" from English to Spanish. Note the spaces added before the "," character in the translation that are not present in the original:

Pginer-WMF triaged this task as Normal priority.Jan 24 2019, 1:41 PM
Etonkovidova added a comment.EditedJan 31 2019, 1:08 AM

Some added spaces are added when   is present (checked in wmf.14):

Compare to Yandex translation:

And it seems that the endash was replaced with hyphen in the sample that @Pginer-WMF and that issue is supposed to be fixed by now:

Based on the example from this conversation, the problem of extra spaces added by Google Translate is also happening for the translation of "Oktay Rıfat Horozcu" from English to Spanish. Note the spaces added before the "," character in the translation that are not present in the original [...]

Here is a most straightforward case - a MT Google translation en: Truism to català with several problems:

  • extra space added around a simple link - In [[philosophy]], a sentence
  • extra space is added in place of the template: {cn|date=June 2017}}
  • a missing template is displayed in place of {{spaced ndash}}

The issue was mentioned in this comment.

Change 501031 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Remove extra spaces from MT result

https://gerrit.wikimedia.org/r/501031

Change 501031 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Remove extra spaces from MT result

https://gerrit.wikimedia.org/r/501031

Etonkovidova closed this task as Resolved.EditedApr 12 2019, 11:31 PM

Checked three article examples mentioned in the task

  • en:Leon Keyselring to he

  • en:Alice Williams to cy
  • en:Truism to es

The extra spaces around punctuation marks, references (except for translation to Japanese), and links are gone; however, templates still have extra spaces - filed as a separate bug T220864: Google Translate adds extra spaces for templates

Pginer-WMF updated the task description. (Show Details)Sep 11 2019, 8:57 AM