Page MenuHomePhabricator

[wmf.18] Large table cannot be translated - 'Automatic translation failed' is displayed.
Open, MediumPublic

Description

Per report Table is failing to translate

  1. With cx2 option enabled, start translating the article en: List of national emergencies in the United States.
  2. The large table will not be translated with any MT options ("Automatic translation failed" message will be displayed).
  3. Switch to 'Copy original content' option and translate a small paragraph and make 'Copy original content' option default. Click on the table, the same "Automatic translation failed" message will be displayed again, which is incorrect since the MT option was not used at all.

Note: No errors in the Console will be displayed. When using cx1 version, there will be numerous errors (not sure if if there are helpful to debug this issue), e.g.
jQuery.Deferred exception: $content[0] is undefined

and

jQuery.Deferred exception: paramNames.indexOf is not a function Template.prototype.extractParametersFromTemplateCode

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 20 2019, 1:02 AM
Etonkovidova renamed this task from [wmf.18] Large table cannot be translated - 'Automatic error failed' is displayed. to [wmf.18] Large table cannot be translated - 'Automatic translation failed' is displayed..Feb 20 2019, 1:06 AM

MT provider has limit of 10000 characters per request. So, it fails to translate section larger than limit. If this is too frequent, we may need to think alternative solution. @santhosh may have better idea to handle this.

In the situation where the MT failed option happens with "Translate from source", it indicates a JavaScript processing error. This error may be printed on the console, but it might also be silently ignored. Did you try with debug=true as well?

Did you try with debug=true as well?

I tried, and there are no errors in the console. cxserver request fails because payload is too big.

Did you try with debug=true as well?

I tried, and there are no errors in the console. cxserver request fails because payload is too big.

Do you mean that the payload that is sent to cxserver is too big and rejected before cxserver starts adapting it? How big is the payload?

Did you try with debug=true as well?

I tried, and there are no errors in the console. cxserver request fails because payload is too big.

Do you mean that the payload that is sent to cxserver is too big and rejected before cxserver starts adapting it? How big is the payload?

Right, the payload sent to cxserver is too big. Since you asked how big it is, it's 618.042 in this case, while 500.000 is the limit.
These numbers stand for number of bytes - doc.

@Pginer-WMF, when payload exceeds 500.000 bytes (0.5MB), user is limited to using "Start with an empty paragraph" if they want to translate the section. But, since source section they're trying to translate has lots of content, that can probably never be usable option.
We can try adjusting limits and improve ways we compact the HTML before we send it to MT services which translate HTML, but there will always be some big sections which will exceed the imposed limits. How do we deal with big content systematically, assuming current option to use "Start with an empty paragraph" isn't enough?

@Pginer-WMF, when payload exceeds 500.000 bytes (0.5MB), user is limited to using "Start with an empty paragraph" if they want to translate the section. But, since source section they're trying to translate has lots of content, that can probably never be usable option.

Thanks for the investigation, @Petar.petkovic. This is relevant to make the tool more solid and reliable, and I'll consider this work in this area for upcoming planning sessions.

We can try adjusting limits and improve ways we compact the HTML before we send it to MT services which translate HTML, but there will always be some big sections which will exceed the imposed limits. How do we deal with big content systematically, assuming current option to use "Start with an empty paragraph" isn't enough?

It seems that one problematic category of content likely to exceed the size limits is "compound" elements such as tables or lists. These elements aggregate several simpler pieces of content that could add up the total size to exceed the limits.

In order to avoid exceeding the size, we can deconstruct these complex elements into smaller pieces, send separate requests for translation each one and re-assemble their translations back. For example, for tables we can send each row (or cell) to the translation services individually. This can be applied to either all tables or only those that exceed a certain size, but should happen behind the scenes (from the user perspective the table got translated at once). I'd expect the translation of each individual element to be independent from another, so breaking them into pieces and translating them individually should not affect to the quality of the translation.

We may also need to improve the general fallback approach in case the limits are exceeded anyway, but the above approach would make it even more of an edge case. As a fallback approach, communicating the issue and copying over the source content seems a reasonable fallback.

Pginer-WMF triaged this task as Medium priority.Mar 18 2019, 3:44 PM
Pginer-WMF moved this task from Needs Triage to CX2 on the ContentTranslation board.

Right, the payload sent to cxserver is too big. Since you asked how big it is, it's 618.042 in this case, while 500.000 is the limit.

From my experience this happens on much smaller texts, like 5k of wikitext.

We have not so much articles, which are bigger than 500k.

Right, the payload sent to cxserver is too big. Since you asked how big it is, it's 618.042 in this case, while 500.000 is the limit.

From my experience this happens on much smaller texts, like 5k of wikitext.
We have not so much articles, which are bigger than 500k.

Are we still talking about these numbers representing bytes?
Could be that you've been seeing some different error. Can you provide us with an example how to reproduce what you observed?

AlexBlokha added a comment.EditedApr 28 2019, 11:09 PM

Yes we are talking about bytes

Example:
Translating this article to ukrainian
https://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D0%BD%D0%B3,_%D0%97%D0%B0%D0%BB%D0%BC%D0%B0%D0%BD

Subtitle "Актёр" and "Другие занятия"

The whole article size is 17k of wikitext.

Let me try clarifying what can happen so that we get "Automatic translation failed" because content is too big.

Yes we are talking about bytes
Example:
Translating this article to Ukrainian
https://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D0%BD%D0%B3,_%D0%97%D0%B0%D0%BB%D0%BC%D0%B0%D0%BD
Subtitle "Актёр" and "Другие занятия"
The whole article size is 17k of wikitext.

Content Translation uses service we call cxserver (which is developed for purpose of serving Content Translation mainly) to translate and adapt content between languages. It splits source article into pieces (sections) that you can translate individually. That means those 17k of wikitext are not translated in whole.
Furthermore, section content is sent to cxserver as HTML, not as wikitext, which means it is bigger than wikitext, because of accompanied HTML markup. Sometimes that difference in length is significant.

When we send that HTML of section we want to translate, in your case from Russian to Ukrainian, cxserver can reject the request because content is too big because of two similar, but different reasons:

  1. First line of checks is that content is not bigger than 500,000 bytes (0.5 megabytes)
  2. If content is smaller than 0.5 MB, but number of characters is greater than 10,000, we don't even try sending it to engines like Yandex, cxserver rejects that request as well

In case #1, we're stuck with "Start with an empty paragraph", as "Copy original content" option also requires content smaller than 0.5MB.
Translating table under section header Актёр, that you added as an example, falls under case #2, where the content is rejected because it exceeds 10,000 characters (it is 40,591 characters), but "Copy original content" works as a fallback option.