Page MenuHomePhabricator

MT Api - provide an identification mechanism to allow requests only from a valid MW context
Closed, ResolvedPublic1 Estimated Story Points

Description

The cxserver production instance - http://cxserver.wikimedia.org acts as bridge to translation services including machine translation. At this point of time, we are not prepared to accept api requests from third party clients.

We need a mechanism to decide how we need to gate requests that are not from MW Content Translation extension and not exceeding the service limits with the backends(can be a potential abuse).

Event Timeline

santhosh raised the priority of this task from to Needs Triage.
santhosh updated the task description. (Show Details)
santhosh subscribed.
Amire80 triaged this task as Medium priority.May 4 2015, 2:21 PM
Amire80 set Security to None.
santhosh raised the priority of this task from Medium to High.May 25 2015, 9:42 AM
santhosh added a project: LE-Sprint-87.
santhosh renamed this task from Evaluate and decide the API access from production cxserver to MT Api - provide an identification mechanism to allow requests only from a valid MW context.Jun 4 2015, 6:42 AM
santhosh updated the task description. (Show Details)

The problem can be approached in two iterations.

  1. Allow requests from MW context alone and reject any other api requests from external applications
  2. Relax the api access and allow other non-CX applications to use CX api, but make sure the access does not exceed the limits(we tune it)

#1 is first priority for CX. Because CX is the primary and first user of this APIs.

To recognize the MW context, Niklas came up with a good idea of cxserver and MW share a secret and use that secret to apply on a unique session identifier(can be username?)

Gwicke and I talked about implementing something similar for restbase, but haven't done that yet. Having a simple, cryptographic token as a parameter (or better, an Authorization header) should be fine in this case. Since this is only dealing with rate limits, and editing the wikis, the security doesn't need to be too fancy.

Since you're making and checking the token using two different languages (unless you want the cx server to call back to the mediawiki api to check the token), using the Json Web Token format will make it easier to get the code right-- there are libraries for both php and node. I'd recommend using RS256 or ES256 (ES will create a smaller token to pass around, but it's more computationally expensive to verify, so you can choose if you want to optimize for space or time), which use public/private keys. That way you only have to store the secret in one place. Or you can start with HS256 which is hash-based, and store the secret on both servers, but then you need to plan for how you'll deploy that if you need to change the key.

The token does need to have a timeout (otherwise another site can scrape a valid token, and just use that when making requests to your server). The JWT format defines a "iat" ("issued at") attribute, so the CX server can decide what time window it wants to accept. For something like this, a 5-10 minute validity should be fine.

I think you'll also want to throttle user that come from mediawiki. Otherwise other sites can just scrape a new token every however long the token is valid for, and then give that to their users. But addressing the problem in two iterations is fine, as long as both get done.

Change 219194 had a related patch set uploaded (by Nikerabbit):
WIP: Send authorization header to cxserver

https://gerrit.wikimedia.org/r/219194

When Language has a rough idea of how the token is going to be constructed, can you post a link to any documentation of it? I'd like to review it early to make sure we get the design right.

Change 221857 had a related patch set uploaded (by Nikerabbit):
Validate authorization header if required

https://gerrit.wikimedia.org/r/221857

@csteipp Can you have a look at the patches [1]? Do you need additional documentation besides those?

Any suggestions how to implement the rate limiting for handing out tokens? I have left it out from the first version.

[1] https://gerrit.wikimedia.org/r/221857 https://gerrit.wikimedia.org/r/219194

@csteipp Can you have a look at the patches [1]? Do you need additional documentation besides those?

@Nikerabbit, sorry for the slow response!

Any suggestions how to implement the rate limiting for handing out tokens? I have left it out from the first version.

[1] https://gerrit.wikimedia.org/r/221857 https://gerrit.wikimedia.org/r/219194

Minor comments, otherwise they look fine.

For actually throttling, you probably want to do something like we do with User::pingLimiter that sets/increments a memcache or redis key, unique to the 'sub' in the JWT, just after you verify the JWT on the cxserver. But if it's a problem to do it on the cxserver, we can work out another way.

Just to clarify what we are talking about with throttling. There is a task for per user throttling inside the cxserver based on the user value of the token: T101398.

Earlier you said something about throttling that I might have misunderstood:

I think you'll also want to throttle user that come from mediawiki. Otherwise other sites can just scrape a new token every however long the token is valid for, and then give that to their users.

Does this mean throttling in cxserver or throttling in the MediaWiki API module that gives out the tokens?

Just to clarify what we are talking about with throttling. There is a task for per user throttling inside the cxserver based on the user value of the token: T101398.

Earlier you said something about throttling that I might have misunderstood:

I think you'll also want to throttle user that come from mediawiki. Otherwise other sites can just scrape a new token every however long the token is valid for, and then give that to their users.

Does this mean throttling in cxserver or throttling in the MediaWiki API module that gives out the tokens?

That wasn't very clear. Sorry about that.

What I meant is that the actual throttling (T101398) should be done before this portion is resilient to fairly basic attacks. As it is without the throttling, a third party's server can grab a new token every hour from our api, then just add that to some javascript that all of their users are using. With throttling, they're limited to the rate limit of a single user. You could try and mitigate this with a throttle on the JWT api, but finding a good heuristic that allows legitimate users and prevents bad ones is going to be pretty hard.

So no, at this point I don't think you need to rate limit the API module. But if T101398 is a long ways off, then we can figure something else out.

Thanks for the clarification. T101398 is the natural continuation of this task, and we will work on it next as we have time (Wikimania etc.)

Change 221857 merged by jenkins-bot:
Validate authorization header if required

https://gerrit.wikimedia.org/r/221857

Change 226616 had a related patch set uploaded (by Nikerabbit):
Add firebase/php-jwt for ContentTranslation

https://gerrit.wikimedia.org/r/226616

We need security review of the firebase/php-jwt library.

Change 219194 merged by jenkins-bot:
Send authorization header to cxserver

https://gerrit.wikimedia.org/r/219194

Arrbee moved this task from In Progress to Done on the LE-CX6-Sprint 1 board.
Arrbee edited projects, added LE-CX6-Sprint 2; removed LE-CX6-Sprint 1.

I have updated the mediawiki/vendor patch following the steps at https://www.mediawiki.org/wiki/Manual:External_libraries (leaving some questions on the talk page).

Change 226616 merged by jenkins-bot:
Add firebase/php-jwt for ContentTranslation

https://gerrit.wikimedia.org/r/226616