Page MenuHomePhabricator

Migrate content translation to the REST API
Open, HighPublic

Description

Problem

Currently content translation exposes its own stand-alone API at https://cxserver.wikimedia.org/. The routes for this API are defined in this file:

GET /page/:language/:title
POST /mt/:from/:to/:provider?
GET /dictionary/:word/:from/:to/:provider?
GET /list/:tool/:from?/:to?
GET /languagepairs
GET /version

This domain is proxied through the Parsoid Varnish cluster, but no caching or other features are used. We are in the process of moving services off the Parsoid Varnishes, and @BBlack in particular would like to eventually decommission those servers.

It might also be worth publicizing generally useful translation and dictionary end points more widely. One option of doing so would be to set up RESTBase API end points for machine translations and dictionaries. This would present translation and dictionary facilities as part of the wider REST content API documentation. Going through RESTBase would also provide detailed request metrics, error logging, and potentially caching / storage, rate limiting and access controls.

Public API

As CXServer does not depend on the domain the requests originate from (and its API is also decoupled from domains), the public translation API endpoints will be hosted on the global wikimedia.org domain. Concretely, these are the endpoints that are going to be exposed under the /translate/ prefix:

  • POST /machine/{from}/{to}{/provider}
  • GET /dictionary/{word}/{from}/{to}{/provider}
  • GET /list/tool/{tool}
  • GET /list/pair/{from}/{to}
  • GET /list/languagepairs

Action Items

  • @KartikMistry : Implement the changes in the API on the CXServer side (add /list/pair/{from}/{to} and /list/languagepairs - T162576
  • @mobrovac : Create the public API specification - PR #796
  • @KartikMistry : Change the extension to use the new RB-provided public API instead of CXServer - T163203

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Yes, it is not just plain page fetch from Parsoid. CXserver internally fetch the page from parsoid, parse it and segments the content to sentences using our language specific sentence segmentation algorithm. It also adds identifiers and various attributes for sections, sentences, links, references etc. So this end point cannot be replaced with https://{domain}/api/rest_v1/page/html/{title}/{revision}

Does this look sensible?

https://{domain}/api/rest_v1/translate/page/{language}/{title}/{revision}

Fetches segmented mediawiki page {title} matching optional {revision} for translation from {language}

Alternatives are being discussed on the PR. Please chime in there.

T162576 is done; Waiting for deploy: https://gerrit.wikimedia.org/r/356127

Do we have an ETA for the deployment?

Yes, it is not just plain page fetch from Parsoid. CXserver internally fetch the page from parsoid, parse it and segments the content to sentences using our language specific sentence segmentation algorithm. It also adds identifiers and various attributes for sections, sentences, links, references etc. So this end point cannot be replaced with https://{domain}/api/rest_v1/page/html/{title}/{revision}

Does this look sensible?

https://{domain}/api/rest_v1/translate/page/{language}/{title}/{revision}

Fetches segmented mediawiki page {title} matching optional {revision} for translation from {language}

Alternatives are being discussed on the PR. Please chime in there.

@santhosh and @Nikerabbit Please have a look too.

T162576 is done; Waiting for deploy: https://gerrit.wikimedia.org/r/356127

Do we have an ETA for the deployment?

I can deploy it on Monday. Verifying it once again and I'll sync on this later today.

Change 360692 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/cxserver@master] Let the page loader accept full project domains as params as well

https://gerrit.wikimedia.org/r/360692

Change 360692 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Let the page loader accept full project domains as params as well

https://gerrit.wikimedia.org/r/360692

T162576 is done; Waiting for deploy: https://gerrit.wikimedia.org/r/356127

Do we have an ETA for the deployment?

I can deploy it on Monday. Verifying it once again and I'll sync on this later today.

We've deployed new end point code of cxserver. Thanks @mobrovac for spotting the issue and quick fix!

@mobrovac: So the current status is that the cxserver changes are done, but the public API has not been finalized & deployed yet?

Change 366312 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Config: Add the CXServer host URI

https://gerrit.wikimedia.org/r/366312

Change 366312 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Config: Add the CXServer host URI

https://gerrit.wikimedia.org/r/366312

Mentioned in SAL (#wikimedia-operations) [2017-07-19T19:08:42Z] <mobrovac@tin> Started deploy [restbase/deploy@c5938f4]: Expose the translation API end points and fix SwaggerUI - T107914 T170729

Mentioned in SAL (#wikimedia-operations) [2017-07-19T19:16:44Z] <mobrovac@tin> Finished deploy [restbase/deploy@c5938f4]: Expose the translation API end points and fix SwaggerUI - T107914 T170729 (duration: 08m 02s)

The RESTBase portion of this task has been completed. The public API has been discussed and deployed. It is now live in production. The next step is for the Content Translation team to start using these end points in the CX extension, so that cxserver.wm.org can be sunset.

@mobrovac While checking, https://es.wikipedia.org/api/rest_v1/#!/Transforms/doMT

Request URL: https://es.wikipedia.org/api/rest_v1/transform/html/from/en/Apertium
{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/internal_error",
  "method": "post",
  "detail": "TypeError: Cannot read property 'status' of undefined",
  "uri": "/es.wikipedia.org/v1/transform/html/from/en/Apertium"
}

There was a bug in RESTBase's code. The fix was merged and deployed. Thank you @KartikMistry for reporting!

GWicke renamed this task from Consider options for longer-term content translation API end points to Migrate content translation to the REST API.Jul 24 2017, 2:12 PM
KartikMistry raised the priority of this task from Medium to High.Sep 26 2017, 7:13 AM

Since there is no blocker for this configuration and v2 is still in development (see: https://phabricator.wikimedia.org/T183139) I'll go ahead and start submitting needed configuration patches now.

@mobrovac I'm planning to schedule this soon. What timeframe works for you?

@KartikMistry are you referring to T163203: Update CX to use the new Restbase provided public API instead of CXServer ? On our side, the public API is out, so we should be ready as all of the blockers have been resolved.

@mobrovac It seems we need to implement /list/{tool}/$from/$to in REST API, which is used in CX.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)