Page MenuHomePhabricator

Add new MT Client for LingoCloud
Closed, ResolvedPublic

Description

Hi, folks,

I added a new MT Client that can support MT from ColorfulClouds. ColorfulClouds is one of leading MT service provider in China. The attached patch is the changes for cxserver codebase.

As mentioned in https://www.mediawiki.org/wiki/Content_translation/Machine_Translation/MT_Clients , we give our materials as follow to prove that we meet the technical requirements.

Translation API

  • If API is not public, it can accept an authentication token, mostly a key
    • it is authenticated by key and the key will be sent separately from this ticket
  • The output format can be JSON for convenience
    • The output format is JSON
  • API should accept POST
    • API support POST
  • API should not demand any user identifiable information such as username. CXServer does not provide it to MT Client
    • API does not require user information from Wiki side
  • API should be capable of accepting a reasonable number of requests per minute
    • see below section on performance
  • API should accept a reasonable amount of content per request
    • see below section on test
  • It is recommended to have a dashboard to analyze the usage of API including requests per day/week/month and Number of characters translated per day/week/month
    • we have a private dashboard, but can offer an account to log-in to check

Guidelines for performance

  • At least 10,000 requests per day
  • At least 10 million characters per day
  • At least 5000 characters per request

execute command

ab -n 100 -c 5 -p test-load.txt -T 'application/x-www-form-urlencoded' -H 'Accept: application/json; charset=utf-8' http://127.0.0.1:8080/v1/mt/en/zh/ColorfulClouds

and the result is

This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /v1/mt/en/zh/ColorfulClouds
Document Length:        5760 bytes

Concurrency Level:      5
Time taken for tests:   3.221 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      654200 bytes
Total body sent:        1088100
HTML transferred:       576000 bytes
Requests per second:    31.05 [#/sec] (mean)
Time per request:       161.046 [ms] (mean)
Time per request:       32.209 [ms] (mean, across all concurrent requests)
Transfer rate:          198.35 [Kbytes/sec] received
                        329.90 kb/s sent
                        528.25 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   127  158  23.7    149     263
Waiting:      126  158  23.7    149     263
Total:        127  158  23.7    149     263

Percentage of the requests served within a certain time (ms)
  50%    149
  66%    164
  75%    172
  80%    175
  90%    184
  95%    213
  98%    242
  99%    263
 100%    263 (longest request)

Input format

  • we translate the plain text version of the content.

Quality of translation

An example from https://www.nobelprize.org/nobel_prizes/literature/laureates/2017/bio-bibl.html

nobel.png (1×591 px, 366 KB)

The attached patch

The attached test-load.txt

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

After some discussion, we want to change our service name from 'ColorfulClouds' to 'LingoCloud', because 'ColorfulClouds' is the company name, and 'LingoCloud' is the brand name. We want to unify all the related name to the brand name.

Then I give the updated patch

Please use this patch. Thanks

@Mountain You may want to change title of this ticket too (Add new MT Client for ColorfulClouds -> Add new MT Client for LingoCloud).

Mountain renamed this task from Add new MT Client for ColorfulClouds to Add new MT Client for LingoCloud.Feb 14 2018, 4:42 AM

Thanks @Mountain for the patch and details. Some comments about the patch and API

  • We would like to have a documentation about the API, ideally hosted in your service website with following information. API URL, parameters and their meaning, Response codes for success and failure. In case of failure, a more explicit explanation on why the API failed.
  • The patch reuses the error codes from Yandex MT client, I assume it is because of copy-paste, but a documented error codes required as mentioned above and then we will use those error codes in the patch
  • To test and verify the client, we would need a key, that you can share privately with me
  • You mentioned about stats dashboard. If it require a username/password - please share that also privately
  • In the patch, you had set LingoCloud as default for en-zh and such pairs, we will remove it for now, but may set after we monitor the usage statistics and feedback.
  • About the usage of x-authorization header, inthe patch it seems a Token is set. Do you differentiate between token and key? Specifically, is this value a time limited token or permanent?
  • Also, will you consider having key set in the POST body as a parameter than a value in header with x-prefix? (Reason for this question: https://tools.ietf.org/html/rfc6648 )

Sorry for delayed reply due to the holiday in China.

We would like to have a documentation about the API, ideally hosted in your service website with following information. API URL, parameters and their meaning, Response codes for success and failure. In case of failure, a more explicit explanation on why the API failed.

We have a private swagger description of the API, I can share it with you privately first, we will make some part of the API public, but the time is uncertain so far. Please check you email @santhosh

The patch reuses the error codes from Yandex MT client, I assume it is because of copy-paste, but a documented error codes required as mentioned above and then we will use those error codes in the patch

For my understanding, it is not a copy-paste, I intended to borrow the code and reuse the logic, but I may double check with this error code with other members in my team.

To test and verify the client, we would need a key, that you can share privately with me

Please check you email @santhosh

You mentioned about stats dashboard. If it require a username/password - please share that also privately

Please check you email @santhosh

In the patch, you had set LingoCloud as default for en-zh and such pairs, we will remove it for now, but may set after we monitor the usage statistics and feedback.

OK

About the usage of x-authorization header, in the patch it seems a Token is set. Do you differentiate between token and key? Specifically, is this value a time-limited token or permanent?

We have the mechanism to differentiate between token and key, but so far no token is time-limited.

Also, will you consider having key set in the POST body as a parameter than a value in header with x-prefix? (Reason for this question: https://tools.ietf.org/html/rfc6648 )

We will consider this recommendation but the related patch may be delivered and deployed serval days later.

Thanks for your patient, and I will give updates about this ticket.

Per Santhosh's request, we should provide a dashboard of statistics and an API-docs website to meet the requirement.

And we are working on the two criteria, the dashboard of statistics had been finished, but a multilingual API-docs website is still in developing.

I hope all the two criteria could be meet in next month and will give an update at the finish time.

Change 444156 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] LingoCloud MT Client

https://gerrit.wikimedia.org/r/444156

Change 444156 merged by jenkins-bot:
[mediawiki/services/cxserver@master] LingoCloud MT Client

https://gerrit.wikimedia.org/r/444156

Mentioned in SAL (#wikimedia-operations) [2018-08-28T09:08:21Z] <kartik@deploy1001> Started deploy [cxserver/deploy@e2e5674]: Update cxserver to 98cbefd and LingoCloud deployment (T186715)

Mentioned in SAL (#wikimedia-operations) [2018-08-28T09:11:59Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@e2e5674]: Update cxserver to 98cbefd and LingoCloud deployment (T186715) (duration: 03m 38s)