Page MenuHomePhabricator

Detection of entity names seems to fail quite often
Open, NormalPublic

Description

Given that I translate an article
And that article contain a common word as an entity name
When I translate a block with that entity name
Then the entity name should be retained

What happen quite often is that the name is translated. If the word is part of the page title it is a very clear indication that it is in fact an entity name. It is also a very clear indication that it is an entity name if it is used with a capital letter without a leading sentence terminator. If so it should probably not be translated.

A user said that emphasized text should be left as it is. I'm not sure if this is correct, as emphasize and quote signs are interchangeable in Norwegian typesetting. If quote signs are used it is a rather clear indication that the text should not be translated. Emphasize is although an indication that capital letters should be retained.

Another option could be to check linked articles for entity names, possibly also articles in the same category. Some words will also be used in connection with names, and could act as markers to detect entity names that should be written with capital letters.

An article that had a name messed up was Frode Grytten, one of the test articles for the nno-nob pair from Apertium.

Note that "Frode" (male name) became "Fråd" (froth) in the first one, and I had to correct the translation.

Second note; in some cases a name has an established translation. That should be respected.

Third note "quite often" is not quite functional… ;)

Event Timeline

jeblad created this task.Jun 9 2016, 8:29 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 9 2016, 8:29 AM
jeblad updated the task description. (Show Details)Jun 9 2016, 8:34 AM
jeblad updated the task description. (Show Details)Jun 9 2016, 8:43 AM
jeblad updated the task description. (Show Details)Jun 9 2016, 12:06 PM
Amire80 triaged this task as Normal priority.Jun 19 2016, 12:26 PM
Amire80 added a subscriber: Amire80.

Thanks a lot for the report.

The ideal solution would probably be to offer multiple translation to the same word. We already have a task to do this for suggestions from several engines (T90161), but the same engine can offer several translations, too. This must be added in the CX extension's UI, and must, of course, be supported by the translation backend. I assume that Google already supports it (which we doesn't currently offer, but we may offer in the future, theoretically), but I'm not sure about Apertium and Yandex.

Restricted Application added a subscriber: jhsoby. · View Herald TranscriptFeb 27 2017, 8:02 AM