Performance analysis for translate API
Closed, ResolvedPublic

Description

If the sections are large and has many links, templates, cxserver will take a good amount of time to finish the processing.
For example, big tables in en:List_of_English_football_champions, like one in section "Football League First Division (1892–1992)".

The request will timeout in that cases. In the case of the above table, I observed that the request did not finish in 2minutes and timed out.

This ticket is to analyse and detect bottlenecks if any in the processing that we can improve and reduce the time.

Related Objects

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2018, 11:23 AM
Pginer-WMF triaged this task as Normal priority.Sep 3 2018, 11:35 AM
santhosh added a comment.EditedSep 6 2018, 12:11 PM

I was not able to translate at all before https://gerrit.wikimedia.org/r/c/mediawiki/services/cxserver/+/458132 and https://gerrit.wikimedia.org/r/c/mediawiki/services/cxserver/+/458352

I did a timing analysis for the big table in en:List_of_English_football_champions, the one un section "Football League First Division (1892–1992)". Target language: es. MT: Apertium

I changed the TitleInfoRequest and TitlePairRequest to Get methods instead of Post:

Once the cache for cxserver is ready, a re-attempt to translation, using "Reset translation" gives

If I use the POST method in TitleInfoRequest and TitlePairRequest (no modification)

Observations

  1. The table is now translatable
  2. Using get request has slight improvement in timing compared to POST. VE API Request cache uses get. We changed the cxserver implmentation to use POST, but I don't think that is relevant now, since we never send more than 50 titles in go.
  1. Is the table translatable with cold caches? It doesn't really mean anything if it is translatable with warm caches, as users will see a failure (and probably won't retry)
  2. Surprising that there is a diff with GET/POST. Is this consistent pattern? I'm okay with switching to GET as long as we don't get silent (or non-silent) failures due to it. Though, it was not about number of titles, but maximum query length. 50*300 bytes is too much (namespace + 255 bytes).

The first, and the last data in the above comment are immediately after cxserver start, means cache was empty. And it was translatable after the api cache fix patches. In master, the table is not translatable at all. I waited for 4 mins and then the request timed out.

Change 463454 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not pass blank textblocks to machine translation services

https://gerrit.wikimedia.org/r/463454

While working on above patch, I found an opportunity for improvement. See the below section content

<section id="cxTargetSection12" data-mw-cx-source="undefined">
    <span about="#mwt51" data-mw="{&quot;parts&quot;:[{&quot;template&quot;:{&quot;target&quot;:{&quot;wt&quot;:&quot;cquote&quot;,&quot;href&quot;:&quot;./Template:Cquote&quot;},&quot;params&quot;:{&quot;1&quot;:{&quot;wt&quot;:&quot;The Turkish influence still remained in speech and mentality. The language ... is full with Turkish words. Older Užičans are at home still very much like the Turks&quot;},&quot;2&quot;:{&quot;wt&quot;:&quot;24px&quot;},&quot;3&quot;:{&quot;wt&quot;:&quot;24px&quot;},&quot;4&quot;:{&quot;wt&quot;:&quot;[[Milutin Uskoković]]&quot;},&quot;5&quot;:{&quot;wt&quot;:&quot;''Došljaci'' (1919)&quot;}},&quot;i&quot;:0}}]}"
        id="mwZw" typeof="mw:Transclusion" data-ve-no-generated-contents="true">
    </span>
    <link about="#mwt51" href="./Category:Articles_with_attributed_pull_quotes" rel="mw:PageProp/Category"
        data-ve-ignore="true">
    <table about="#mwt51" class="cquote pullquote" id="173" role="presentation" style="margin:auto; border-collapse: collapse; border: none; background-color: transparent; width: auto; "
        data-ve-ignore="true">
        <tbody id="174">
            <tr id="175">
                <td id="176" style="width: 20px; vertical-align: top; border: none; color: #B2B7F2; font-size: 40px; font-family: 'Times New Roman', Times, serif; font-weight: bold; line-height: .6em; text-align: left; padding: 10px 10px;"><span
                        class="cx-segment" data-segmentid="177"></span></td>
                <td id="178" style="vertical-align: top; border: none; padding: 4px 10px; "><span class="cx-segment"
                        data-segmentid="179">The Turkish influence still remained in speech and mentality. </span><span
                        class="cx-segment" data-segmentid="180">The language ... is full with Turkish words. </span><span
                        class="cx-segment" data-segmentid="181">Older Užičans are at home still very much like the
                        Turks</span></td>
                <td id="182" style="width: 20px; vertical-align: bottom; border: none; color: #B2B7F2; font-size: 40px; font-family: 'Times New Roman', Times, serif; font-weight: bold; line-height: .6em; text-align: right; padding: 10px 10px;"><span
                        class="cx-segment" data-segmentid="183"></span></td>
            </tr>
            <tr id="184">

                <td class="cquotecite" colspan="3" id="185" style="border: none; padding-right: 4%; font-size: smaller; line-height:normal; text-align: right;"><cite><span
                            class="cx-segment" data-segmentid="186"><span typeof="mw:Entity"></span><a class="cx-link"
                                data-linkid="187" href="./Milutin_Uskoković" rel="mw:WikiLink" title="Milutin Uskoković">Milutin
                                Uskoković</a>, <i>Došljaci</i> (1919)</span></cite></td>
            </tr>
        </tbody>
    </table>

</section>

There is nothing to translate for an MT client in this. All elements are template and its fragements. Adaptation on data-mw definition will be happening anyway. It is a waste to translate the table content.

If we can figure out the above fact in MTClient class translateHtml method, we can save serveral MT service requests.

Change 463454 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not pass blank textblocks to machine translation services

https://gerrit.wikimedia.org/r/463454

Change 464785 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not machine translate transclusion related elements

https://gerrit.wikimedia.org/r/464785

Change 464796 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not adapt the translation units under a transclusion context

https://gerrit.wikimedia.org/r/464796

Change 464785 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not machine translate transclusion related elements

https://gerrit.wikimedia.org/r/464785

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2018-10-15T03:14:02Z] <kartik@deploy1001> Started deploy [cxserver/deploy@5a70ef1]: Update cxserver to 47a864b (T205420, T203077, T205700, T205616)

Mentioned in SAL (#wikimedia-operations) [2018-10-15T03:18:46Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@5a70ef1]: Update cxserver to 47a864b (T205420, T203077, T205700, T205616) (duration: 04m 44s)

Mentioned in SAL (#wikimedia-operations) [2018-10-15T05:16:28Z] <kartik@deploy1001> Started deploy [cxserver/deploy@fd74c3b]: Update cxserver to b51f363 (T203077, T99934, T203550)

Mentioned in SAL (#wikimedia-operations) [2018-10-15T05:20:53Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@fd74c3b]: Update cxserver to b51f363 (T203077, T99934, T203550) (duration: 04m 25s)

Change 464796 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not adapt the translation units under a transclusion context

https://gerrit.wikimedia.org/r/464796

Mentioned in SAL (#wikimedia-operations) [2018-10-22T04:37:35Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@904151f]: Update cxserver to eee8974 (T207070, T203077, T199529) (duration: 05m 42s)

Etonkovidova closed this task as Resolved.Nov 1 2018, 12:20 AM
Etonkovidova added a subscriber: Etonkovidova.

Checked in cx2 - the translating time looks decent for such big tables - in a range of 12-19 sec.

Change 472424 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Prevent duplicate adaptation of tags under textblock

https://gerrit.wikimedia.org/r/472424

Change 472424 abandoned by Santhosh:
Prevent duplicate adaptation of tags under textblock

https://gerrit.wikimedia.org/r/472424

Change 472425 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Prevent duplicate adaptation of tags under textblock

https://gerrit.wikimedia.org/r/472425

Change 472425 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Prevent duplicate adaptation of tags under textblock

https://gerrit.wikimedia.org/r/472425

Petar.petkovic removed a project: Patch-For-Review.
Petar.petkovic closed this task as Resolved.

Mentioned in SAL (#wikimedia-operations) [2018-11-16T11:08:55Z] <kartik@deploy1001> Started deploy [cxserver/deploy@473b0de]: Update cxserver to b7cdb26 (T208831, T203077, T203160, T206777)

Mentioned in SAL (#wikimedia-operations) [2018-11-16T11:13:21Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@473b0de]: Update cxserver to b7cdb26 (T208831, T203077, T203160, T206777) (duration: 04m 26s)