Page MenuHomePhabricator

CX2: Paragraph not added to the translation with MT failure message
Closed, ResolvedPublic

Description

When trying to translate Auri (band) from English to Portuguese, adding all paragraphs and sections (except for the infobox, which seems to have a different issue, probably T207449) results in an empty paragraph and the "Automatic translation failed" message showing:

Screen Shot 2018-10-19 at 10.12.42 2.png (176×759 px, 18 KB)

Inspecting the console and the network tabs, it seems that the translation service (Yandex in this case) was able to return the translation, but "Cannot read property 'start' of null" error was getting in the way somehow:

Screen Shot 2018-10-19 at 10.23.24.png (528×1 px, 189 KB)

Event Timeline

Pginer-WMF triaged this task as Medium priority.Oct 19 2018, 8:38 AM
Pginer-WMF moved this task from Needs Triage to CX2 on the ContentTranslation board.
Pginer-WMF added a subscriber: Petar.petkovic.

I have tried the same page, en:Auri_(band) to Spanish and the end result is the same.
The whole translation process is blocked due to many JS errors. All of those are caused by getRange method of ve.dm.LinearSelection returning null, after which the code tries to access properties or methods of ve.Range.
Also, there is TypeError: Cannot read property 'shallowCloneFromRange' of null described in T202714.

Haven't done the investigation on the root cause behind this, but adding VE tag for investigation.

Change 468620 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Document reduce: For subdocument, don't start id at 0

https://gerrit.wikimedia.org/r/468620

It is a regression from https://gerrit.wikimedia.org/r/c/mediawiki/services/cxserver/+/461386. The attributes are restored on wrong elements in the reduce-expand procedure. You can see the section tag getting the attributes of a reference. In lineardoc model, the references usually goes to a sub document. the counter we used as id for the attribute dump is reset to 0 in that case. Fixed now.

Change 468620 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Document reduce: For subdocument, don't start id at 0

https://gerrit.wikimedia.org/r/468620

Mentioned in SAL (#wikimedia-operations) [2018-10-22T13:08:51Z] <kartik@deploy1001> Started deploy [cxserver/deploy@5f53734]: Update cxserver to 7f996f3 (T207445)

Mentioned in SAL (#wikimedia-operations) [2018-10-22T13:12:44Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@5f53734]: Update cxserver to 7f996f3 (T207445) (duration: 03m 53s)

After the fix was deployed I tried again the translation in the example and got the following issues:

  • It was not possible to inspect the added reference, so it is not clear if it was properly added. Note that the reference card is not appearing for the [1] element selected in the translation.
  • A couple of strange abuse filter errors are shown. One seems related to the presence of <nowiki> markup, and the other seems about the content being mostly capital letters.

Screen Shot 2018-10-22 at 15.38.04.png (656×1 px, 251 KB)

Logstash reported following when I tried to reproduce (after deployment in Production):

Encountered <a class="cx-segment" data-segmentid="40"><span class="mw-redirect cx-link" data-linkid="41" href="./Auri_(album)" id="mwCw" rel="mw:WikiLink" title="Auri (album)">[2]</span></a> -- serializing as extlink and dropping <a> attributes unsupported in wikitext.
href is missing from a tag <a class="cx-segment" data-segmentid="40"><span class="mw-redirect cx-link" data-linkid="41" href="./Auri_(album)" id="mwCw" rel="mw:WikiLink" title="Auri (album)">[2]</span></a>

Are these from parsoid? It looks like the tags a and span have their attributes swapped.

Patient: en:Auri (band) -> pt (lead paragraph)

Original parsoid HTML:

<p id="mwBA"><b id="mwBQ">Auri</b> is a Finnish band composed of vocalist and violist <a rel="mw:WikiLink" href="./Johanna_Kurkela" title="Johanna Kurkela" id="mwBg">Johanna Kurkela</a>, keyboardist and backing vocalist <a rel="mw:WikiLink" href="./Tuomas_Holopainen" title="Tuomas Holopainen" id="mwBw">Tuomas Holopainen</a>, and guitarist, keyboardist and pipe player <a rel="mw:WikiLink" href="./Troy_Donockley" title="Troy Donockley" id="mwCA">Troy Donockley</a>.<sup about="#mwt61" class="mw-ref" id="cite_ref-4" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{},"body":{"id":"mw-reference-text-cite_note-4"}}'><a href="./Auri_(band)#cite_note-4" style="counter-reset: mw-Ref 4;"><span class="mw-reflink-text">[4]</span></a></sup> Holopainen and Donockley are also members of Finnish band <a rel="mw:WikiLink" href="./Nightwish" title="Nightwish" id="mwCQ">Nightwish</a>.<sup about="#mwt12" class="mw-ref" id="cite_ref-metalinvader_2-1" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"metalinvader"}}'><a href="./Auri_(band)#cite_note-metalinvader-2" style="counter-reset: mw-Ref 2;"><span class="mw-reflink-text">[2]</span></a></sup> Their self-titled first album, <i id="mwCg"><a rel="mw:WikiLink" href="./Auri_(album)" title="Auri (album)" id="mwCw" class="mw-redirect">Auri</a></i>, was released on March 23, 2018.<sup about="#mwt14" class="mw-ref" id="cite_ref-metalinvader_2-2" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"metalinvader"}}'><a href="./Auri_(band)#cite_note-metalinvader-2" style="counter-reset: mw-Ref 2;"><span class="mw-reflink-text">[2]</span></a></sup><sup about="#mwt16" class="mw-ref" id="cite_ref-prog_1-1" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"prog"}}'><a href="./Auri_(band)#cite_note-prog-1" style="counter-reset: mw-Ref 1;"><span class="mw-reflink-text">[1]</span></a></sup> and was recorded at <a rel="mw:WikiLink" href="./Peter_Gabriel" title="Peter Gabriel" id="mwDA">Peter Gabriel</a>'s <a rel="mw:WikiLink" href="./Real_World_Studios" title="Real World Studios" id="mwDQ">Real World Studios</a>.<sup about="#mwt18" class="mw-ref" id="cite_ref-prog_1-2" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"prog"}}'><a href="./Auri_(band)#cite_note-prog-1" style="counter-reset: mw-Ref 1;"><span class="mw-reflink-text">[1]</span></a></sup></p>

CXServer debug messages:

Not-adapting a reference node without data-mw.body.html: cite_ref-metalinvader_2-1
Not-adapting a reference node without data-mw.body.html: cite_ref-metalinvader_2-2
Not-adapting a reference node without data-mw.body.html: cite_ref-prog_1-1
Not-adapting a reference node without data-mw.body.html: cite_ref-prog_1-2
Not-adapting a reference node without data-mw.body.html: cite_ref-prog_1-3

Results of reducing the original parsoid HTML:

<p id="mwBA"><b id="mwBQ">Auri</b> is a Finnish band composed of vocalist and violist <a id="1">Johanna Kurkela</a>, keyboardist and backing vocalist <a id="2">Tuomas Holopainen</a>, and guitarist, keyboardist and pipe player <a id="3">Troy Donockley</a>.<sup id="4"><a id="5"><span id="6">[4]</span></a></sup> Holopainen and Donockley are also members of Finnish band <a id="4">Nightwish</a>.<sup id="5"><a id="6"><span id="7">[2]</span></a></sup> Their self-titled first album, <i id="mwCg"><a id="5">Auri</a></i>, was released on March 23, 2018.<sup id="6"><a id="7"><span id="8">[2]</span></a></sup><sup id="6"><a id="7"><span id="8">[1]</span></a></sup> and was recorded at <a id="6">Peter Gabriel</a>'s <a id="7">Real World Studios</a>.<sup id="8"><a id="9"><span id="10">[1]</span></a></sup></p>

Notice how ids 4-9 are repeated multiple times.

Change 469175 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/services/cxserver@master] Document reduce: avoid re-using section ids

https://gerrit.wikimedia.org/r/469175

Change 469175 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Document reduce: avoid re-using section ids

https://gerrit.wikimedia.org/r/469175

Mentioned in SAL (#wikimedia-operations) [2018-10-24T05:20:39Z] <kartik@deploy1001> Started deploy [cxserver/deploy@80dc518]: Update cxserver to 9ad60d9 (T207445)

Mentioned in SAL (#wikimedia-operations) [2018-10-24T05:24:46Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@80dc518]: Update cxserver to 9ad60d9 (T207445) (duration: 04m 06s)

I checked that the issue no longer happens in production.
I filled the abuse filter issue as a separate ticket: T207842: CX2: Abuse filter unexpectedly triggered. Not sure how much the new ticket is related, but if anyone has a clue on what may cause the abuse filter to be triggered feel free to share in the ticket.