Page MenuHomePhabricator

Security review for Recommendation API
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

The Wikimedia Recommendation API is an open-access HTTP API, currently under development, that will provide personalized recommendations for a variety of use-cases. The initial use-case for this API is ContentTranslation where it recommends articles to translate from one language to another by using traffic trends, activity logs, and other signals to identify gaps in coverage and to match those to editors based on interest. The API is intended to be flexible enough to serve as a basis for new features by third-party developers or researchers.

Description of how the tool will be used at WMF

Dependencies

Current
Potentially in the future

Has this project been reviewed before?

No, although the deployment of dependencies will be similar to ORES (T130233).

Working test environment

security-test.recommendation-api.eqiad.wmflabs has been set up for testing and is accessible at http://recommend-security-test.wmflabs.org.
An environment can also be set up using docker (Dockerfile) or with the normal virtualenv/pip dance. Note: the docker environment uses Flask's built-in server.

Post-deployment

Research will be responsible and @schana is the primary contact.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 14 2016, 10:05 AM

Once this is ready for review, please setup a test environment and update the task description with it's location. This will help help the review process go a bit smoother.

schana updated the task description. (Show Details)Oct 17 2016, 11:55 AM

@dpatrick I've created a test environment and updated the description. Do you (or someone else) need access to the instance?

@schana, yes, please give myself and @Bawolff access. One of us will be doing the review.

@dpatrick Access should now be granted. Let me know if anything further is needed. Thanks!

leila added a subscriber: leila.Nov 28 2016, 6:30 PM

@dpatrick @Bawolff just for us to have a sense of when we should expect this review to be done, can you provide some timelines? this will help us with planning on our end.

@dpatrick @Bawolff just for us to have a sense of when we should expect this review to be done, can you provide some timelines? this will help us with planning on our end.

Hi. I had planned to do this last week, but got behind with some other stuff. I've now started looking at this.

Working test environment
security-test.recommendation-api.eqiad.wmflabs has been set up for testing and is accessible at http://recommend-security-test.wmflabs.org.
An environment can also be set up using docker (Dockerfile) or with the normal virtualenv/pip dance. Note: the docker environment uses >Flask's built-in server.

This seems to be giving 502's...

It looks like the service had stopped. The host isn't puppitized (yet), so restarting the recommendation service fixed it. Sorry.

I haven't finished looking at this yet, but some preliminary concerns:

  • In web/templates/index.html - The escaping done for the values inserted into javascript s,t and seed variables isn't quite right - backslashes and newlines aren't escaped properly. This is not exploitable, but could be used to cause a syntax error (e.g. ?s=foo\ or ?s=foo%0Abar). Despite the fact this isn't exploitable, it is very close to being exploitable (for example, It would be exploitable if the code was run through a js minifier), thus I think its important to fix this.
  • The links with target=_blank should have rel=noopener, particularly if they go to a non-WMF controlled domain (This is to prevent the attack described in https://mathiasbynens.github.io/rel-noopener/)
    • similarly in web/static/gf-preview.tag data = data.replace('<base href="', '<base target="_blank" href="https:'); has a similar issue. Additionally, doing regex find and replace on html is often the cause of security issues, although it looks safe in this case.

Sorry that this took so long. I lack experience with python and riot which caused me to take more time than I normally would.

Overall, this looks good. By and large the code seems safe, and most of the issues are either low severity, or very difficult to trigger.

There are two issues I saw come up multiple times. Both are relatively minor as far as security issues go, but still should be fixed:

  • Inconsistent encodeURIComponent-ing of url parts in js. Even if the data is from what's probably a safe source, encodeURIComponent should be used to catch special characters, and so we can tell at a glance that it is safe. This is especially true when using JSONP, where there is even the remotest possibility of overriding the callback parameter.
  • Using target=_blank for external links without specifying rel=noopener

The full list of issues are as follows:

Normal issues

  • The escaping of js values inserted into web/index.html is incorrect. They are being escaped for html, but should be escaped for javascript. In particular, ?s=%0A or ?s=foo\ can both cause syntax errors, and other special characters will be escaped in ways that are not correct (albeit not a security issue). This cannot be exploited at present, but if two values were inserted on the same line it would be, so this should definitely be fixed, in case the js ever gets run through a minimizer, or something like that.
  • All target=_blank links should have rel=noopener.
  • Line 93 gf-preview.js - The find and replace of </head> could lead to XSS. Consider if someone could add html like <div title="</head><img src=x onerror=alert(1)"> to the page. This is perfectly valid and safe prior to the regex (albeit MW usually doesn't allow that sort of thing, but extensions might), but unsafe after the regex. (In practice this would be very hard to pull off as you would have to be able to insert the tag prior to the actual <head/>, but that's not something we should rely on)
  • line 106 of gf-preview.js. This doesn't put the html text inside quote marks (And also doesn't escape quotes). This will cause the page to be executed as javascript, which is not what is desired.
  • line 46 of gf-preview.js, encodeURIComponent missing from jsonp queries.

Minor

  • It would be nice if trusted data from other servers would be escaped (langauge_pairs, language_to_domain_mappings) or failing that, checked if they contain an apostrophe and throw an exception.
  • Since you're already loading mw.RegExp.escape, the writeCookie() function in web/static/cookies.js should probably use it. (This is very minor)
  • line 70 of gf-input.js - Please encodeURIComponent s and t.
  • line 63-ish of gf-preview.js - encodeURIComponent of all the parts of the url.
    • ditto gf-create.js
  • wm-typeahead.js doesn't seem to escape titles properly when doing syntax highlighting. (Perhaps this is more something I should bring up with the portal people)

Other comments

  • You may want to consider using CORS instead of JSONP for api communication. The mediawiki api supports anonymous CORS if you provide an &origin=* url parameter. From what I understand this is more reliable than jsonp, and also less scary as it doesn't rely on script execution.

Thanks @Bawolff! I'll create a task for fixing the issues.

schana closed this task as Resolved.Jan 26 2017, 4:29 PM