Notice that some sections cannot be expanded and has the following error from requests:
Since there is no error handling, the loading indicator is shown forever
| santhosh | |
| Oct 10 2024, 6:12 AM |
| F58913857: Screenshot 2025-03-25 at 09.25.39.png | |
| Mar 25 2025, 6:29 AM |
| F58913851: Screenshot 2025-03-25 at 09.21.37.png | |
| Mar 25 2025, 6:29 AM |
| F58865697: image.png | |
| Mar 19 2025, 7:48 AM |
| F58865438: image.png | |
| Mar 19 2025, 7:41 AM |
| F57659764: translate.wmcloud.org_html(Wiki Tablet).png | |
| Oct 30 2024, 3:42 PM |
| F57659732: Screenshot 2024-10-30 at 16.09.03.png | |
| Oct 30 2024, 3:42 PM |
| F57659730: Screenshot 2024-10-30 at 16.09.30.png | |
| Oct 30 2024, 3:42 PM |
| F57602968: image.png | |
| Oct 10 2024, 6:12 AM |
Notice that some sections cannot be expanded and has the following error from requests:
Since there is no error handling, the loading indicator is shown forever
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| AX: Refactor section-by-section translation | mediawiki/extensions/ContentTranslation | master | +565 -308 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T341196 MinT for Wiki Readers (machine translation of wiki contents) | |||
| Open | None | T359072 MinT for Wiki Readers MVP | |||
| Open | None | T381406 MinT for Wiki Readers MVP: Complete key issues before continuing with experimentation | |||
| Resolved | BUG REPORT | ngkountas | T376865 MinT for Wikipedia Readers: Large sections fails to translate |
It makes sense to improve error handling. In addition to that, I wonder if there is a broader aspect to consider. Based on the screenshot, the error seems to be produced by some limits in the size of the content requested. If I recall correctly, those limits were introduced in CXServer to avoid issues with external translation services, and they were creating issues when translating large elements such as tables.
If this were the case, I wonder if it would make sense to consider a separate ticket to make the length limitations less strict for services running on Wikimedia infrastructure, such as MinT and Apertium. That would apply to all products using them, not specific to MinT for Wiki Readers.
Since we prioritize user experience, and sending a large chunk will have proportional delay in response time, cxserver accepting larger chunk will not help users. The clients need to send smaller chunks of content in sequential batches.
For example, if section has 5 paragraphs, send them by paragraphs one after another rather than the full paragraph. If a section is li or any such block tags, send them by block tags. This is how community wishlist implemented their MT feature. In this particular example of Tokyo, we are sending a references section with 229 reference items in one go. First of all, reference support is already broken. Secondly, references are unrelated units and can be independendly translated/adapted, hence they can be either not sent for translation or just send them as smaller chunks.
Reference resolution in html is very hard since there reference pointer and reference content will be in two locations and one such location is yet to be processed/parsed/translated.
It may be too late, but this special page, in a sense , is a readonly version of second column of CX where we have solved all of these issues in past several years.
We definitely want clients to make requests in the best possible way. I think that this ticket is a valid one.
I also remember that some pieces of content cause problems in existing tools such as Content Translation in T216583: [wmf.18] Large table cannot be translated - 'Automatic translation failed' is displayed.. Quoting the final summary below since it describes some of the current limits:
I don't have all details fresh since this was reported in 2019 and I got a bit lost on the technical details on why it was complex to reduce the request size in such case, but I think that there may be an opportunity to make the translation size unit less strict for services where we don't have hard constraints. I think it is possible to increase the max request size allowed (to avoid some corner cases) while still encouraging requests to be minimal, in order to provide the best possible user experience: fast translation for most content while still getting a translation for the case of the above table.
Reference resolution in html is very hard since there reference pointer and reference content will be in two locations and one such location is yet to be processed/parsed/translated.
It may be too late, but this special page, in a sense , is a readonly version of second column of CX where we have solved all of these issues in past several years.
This is not clear to me. Rendered references don't seem very different than other pieces of content with texts and external links, so it may be useful to provide more detail in T376860 since it seems more specific to reference support.
If we look at the [1] citation and the corresponding reference we have the following:
| A link with "[1]" as text and a link target pointing to "#cite_note-1" which refers to the reference below | A text paragraph with two links to external sites. |
Using the HTML translation of the MinT test instance and pasting the contents from the reference, MinT seems able to translate the contents:
The only apparent issue is for a link to get lost, but that seems more of a general issue of the algorithm to re-apply links (shared with Content Translation and reported in tickets such as T314127)
For a reader context, it seems more straightforward to translate the rendered contents of a reference than trying to apply the whole adaptation process. Template adaptation makes sense in the editing context of Content Translation, since the final contents can only have templates defined in the target wiki. However, in a reader context, there is no problem to use the source templates with translated content. Doing template adaptation for readers seems more problematic since the template may not exist in the target language, references may be inside another template (which is unsupported, T209266) or may fall in the cases listed under T200786: Better support for References in Content Translation (epic).
Triage meeting notes: Currently the whole section is sent. It is possible to split section requests when they exceed the limit. However, that would result in increasing the number of requests as reported in T378326. As an initial measure, we could try to remove the artificial limit that we don't need to apply to MinT. Then check which is the resulting performance and decide whether splitting them (and generating more requests) is worth it.
24 out of 48 mentioned in T378326: MinT for Wikipedia Readers: Reduce parallel MT requests on page load are pre-flight requests (OPTIONS) .
I'd still prefer to do this:
I think if we limit the number of parallel requests and instead send 1 or 2 requests at a time, it might not result in a large number of requests. I find responses from MinT for smaller sections to be much faster.
Change #1127943 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):
[mediawiki/extensions/ContentTranslation@master] AX: Refactor section-by-section translation
Some notes from my tests
Submitted: 1129174: AX: ViewTranslationPage: Avoid adapting links without sourceTitle | https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/1129174 to fix 5)
Change #1127943 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] AX: Refactor section-by-section translation
This can now be tested on the test instance: https://language-cx.wmcloud.org/index.php/Special:AutomaticTranslation?page=Tokyo&from=en&to=hi&step=translation
I'm noticing that all sections load as expected.
Tested; all sections load as expected.
Exception:
Thanks for catching that. I noticed that while testing: T376860: MinT for Wikipedia Readers: All references are missing; I'd recommend we scope this work to that task. I'll leave a comment there.
Since we are tracking that as part of T376860: MinT for Wikipedia Readers: All references are missing; marking this as done.