Page MenuHomePhabricator

MT stops translating "randomly"
Closed, DeclinedPublic

Description

Steps to reproduce:

  1. Start translating en:Tensor_product
  2. Using Google as the MT engine, start translating one by one from the top of the article
  3. As seen in the screenshot, the third section does not get translated. This happens again if you continue translating.

An error appears on the console although I'm not sure it is related


Method where this happens:

mw.cx.TranslationTracker.prototype.isExcludedFromValidation = function ( sectionModel ) {
    var excludedTypes = [
        'cxBlockImage', 'mwBlockImage',
        'cxTransclusionBlock', 'mwTransclusionBlock',
        'mwReferencesList',
        'mwMath',
        'definitionList',
        'mwAlienBlockExtension',
        'mwTable', 'list', 'mwHeading'
    ],
    childType = sectionModel.getChildNodeName();

    return excludedTypes.indexOf( childType ) >= 0;
};

Details

Related Gerrit Patches:
operations/deployment-charts : masterUpdate cxserver to 2019-12-05-090549-production
mediawiki/services/cxserver : masterTextBlock#getRootItem: Consider non-whitespace text

Event Timeline

Jpita created this task.Aug 9 2019, 12:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 9 2019, 12:46 PM
Petar.petkovic renamed this task from MT stops translating "randomly". to MT stops translating "randomly".Aug 9 2019, 12:54 PM
Petar.petkovic updated the task description. (Show Details)
Pginer-WMF triaged this task as Medium priority.Aug 26 2019, 3:15 PM

Unhandled Javascript exception was failing on call to getChildNodeName() in mw.cx.TranslationTracker#isExcludedFromValidation. That method is rewritten in 529727 and should no longer happen once the patch is merged.

Petar.petkovic added a comment.EditedAug 27 2019, 10:43 AM

When I was using Content Translation (as a user) to translate articles to Serbian, I've seen MT quality decline rapidly as I was adding more sections to the translation. It got to the point where it stopped translating, just what this ticket is about. I don't think article included any math like example from description.

MT engine was Google Translate and source language was Croatian, if I remember correctly. Now, translating from Croatian to Serbian is pretty easy, you swap Latin script to Cyrillic (you don't have to as Serbian can be written in both Latin and Cyrillic) and change some words, but that doesn't matter to MT engines. At times, it's clear that translation from Croatian to Serbian goes through English.
Not sure if MT engine is to blame or our internal code using it, for this stopping of translation.

declining quality can only be attributed to MT service. It is either translated or not translated from cxserver perspective. How good the translation is always with MT service. But let us watch this ticket after https://gerrit.wikimedia.org/r/529727 is merged

Change 548956 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] TextBlock#getRootItem: Consider non-whitespace text

https://gerrit.wikimedia.org/r/548956

Are there any articles where this behavior is reproducible? I tried (locally) en:Tensor_product, which is mentioned in the description, but it works fine (using Google and Yandex), without 548956.

@santhosh and @Jpita, any examples?

@Petar.petkovic I no longer see it happening in production

Jpita closed this task as Declined.Tue, Nov 19, 8:51 PM

Change 548956 merged by jenkins-bot:
[mediawiki/services/cxserver@master] TextBlock#getRootItem: Consider non-whitespace text

https://gerrit.wikimedia.org/r/548956

Change 555784 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/deployment-charts@master] Update cxserver to 2019-12-05-090549-production

https://gerrit.wikimedia.org/r/555784

Change 555784 merged by jenkins-bot:
[operations/deployment-charts@master] Update cxserver to 2019-12-05-090549-production

https://gerrit.wikimedia.org/r/555784

Deployed in Production.