Page MenuHomePhabricator

Security review for Youdao MT for Content Translation
Closed, ResolvedPublic

Description

Project Information

  • Name of tool/project: Content Translation / cxserver
  • Project home page: mediawiki.org/wiki/Content_translation
  • Name of team requesting review: Language
  • Primary contact: Kartik Mistry/ Santhosh Thottingal
  • Target date for deployment: August 2016
  • Link to code repository / patchset:
  • Programming Language(s) Used: JavaScript/PHP

Description of the tool/project

Youdao is external Machine Translation service which provides access to Content Translation with limited capacity. This will be integrated inside Content Translation tool.

Description of how the tool will be used at WMF

Users of Content Translation will use Youdao Machine Translation for limited language pairs in Production.

Dependencies

None.

Has this project been reviewed before?

Yes. See: https://phabricator.wikimedia.org/T85686

Working test environment

cxserver-beta.wmflabs.org has cxserver setup but Content Translation extension (en.wikipedia.beta.wmflabs.org/wiki/Special:ContentTranslation) is broken at moment.

Post-deployment

name of team responsible for tool/project after deployment and primary contact
Language (Kartik Mistry)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Arrbee triaged this task as Normal priority.Aug 17 2016, 7:23 AM
KartikMistry updated the task description. (Show Details)Aug 23 2016, 4:19 AM

While reviewing this, similar code: https://phabricator.wikimedia.org/T90208 should also be reviewed.

While reviewing this, similar code: https://phabricator.wikimedia.org/T90208 should also be reviewed.

Note: ^^This isn't priority. Youdao should be done first.

While reviewing this, similar code: https://phabricator.wikimedia.org/T90208 should also be reviewed.

Note: ^^This isn't priority. Youdao should be done first.

@KartikMistry, would you mind creating a separate phab ticket for this one? Thanks!

Note that the MT clients has a common pattern. Mostly the API url and key changes. Sometimes the error handling too. And it is potentially possible that more of this kind of clients will be added.

Bawolff set Security to Software security bug.Sep 13 2016, 8:55 AM
Bawolff added a project: Security.
Bawolff changed the visibility from "Public (No Login Required)" to "Custom Policy".
Bawolff added a subscriber: Bawolff.

Security review:

  • If the source wikitext has something like [[Javascript:foo()]], then the translated version will have an a tag with a href="Javascript:foo()". Its unclear if this is exploitable, as click events don't actually navigate you to the link, but its definitely not a good thing. [Note: I did not review that part of the app in general, I was primarily only looking at the Youdao part]
  • I don't understand the security goals of the JWT-token in the Authorization header. There should be a code comment somewhere explaining what threat it is meant to stop. I'm concerned that it may not be accomplishing what you want it to, but I'm not really sure, as I'm not sure what the intention is.

Otherwise I think this is fine.

The purpose of the JWT-token is two-fold:

  • To reliably identify the requesting user for monitoring abuse and rate-limiting (not currently implemented)
  • To make it more difficult to use the service outside CX extension. Clearly, leaving it all open is not a good idea, and clearly, we cannot completely prevent translation of arbitrary content because people could save mostly arbitrary content on their user page on a wiki which is not actively monitored.

If the source wikitext has something like [[Javascript:foo()]], then the translated version will have an a tag with a href="Javascript:foo()". Its unclear if this is exploitable, as click events don't actually navigate you to the link, but its definitely not a good thing. [Note: I did not review that part of the app in general, I was primarily only looking at the Youdao part]

The input to MT engines is not Wikitext. Infact, cxserver does not know about wikitext at all. All the data it accept as input and output is HTML. The mediawiki extension Content translation also works with HTML. The only place it produce wikitext is at the end of workflow, when translator decide to publish an article to a wiki

In the case of Youdao, this MT engine is not capable of translating HTML. So the translateHtml method passes the input HTML to an HTML to plain text conversion module and then to translateText method. So what goes to MT engine hosted outside is this plain text(not wikitext, just plain text) and we get translation also in plain text.

If the source wikitext has something like [[Javascript:foo()]], then the translated version will have an a tag with a href="Javascript:foo()". Its unclear if this is exploitable, as click events don't actually navigate you to the link, but its definitely not a good thing. [Note: I did not review that part of the app in general, I was primarily only looking at the Youdao part]

That issue wasnt related to Youado (it was discovered when i was testing the extension on wikipedia with languages fr and en just to get a sense how the various pieces fit together). Content translation seems to be taking safe parsoid html and outputting slightly unsafe html in the preview window.

The purpose of the JWT-token is two-fold:

  • To reliably identify the requesting user for monitoring abuse nd rate-limiting (not currently implemented
  • To make it more difficult to use the service outside CX extension. Clearly, leaving it all open is not a good idea, and clearly, we cannot completely prevent translation of arbitrary content because people could save mostly arbitrary content on their user page on a wiki which is not actively monitored.

Ok. Just as long as you're aware its mostly trivial to get around (since any non-globally blocked user can get a token and anyone can create an account). If the goal is to deter third party users, the cxserver auto-generated help pages should probably say on them that you are not allowed to use it externally.

Do you mean this page: https://cxserver.wikimedia.org/v1/ ?

For me it's nearly impossible to find anything on cxserver.wikimedia.org unless you know the address already. I doubt it has much readers.

@dpatrick @Bawolff Is there anything else needed from Language team on this review? Please let us know.

@dpatrick @Bawolff Is there anything else needed from Language team on this review? Please let us know.

Sorry I wasn't clear before (And sorry I wasn't more responsive). The review should is considered "Done" generally, you can go ahead and deploy Youdao MT thing.

The bug is still open because I have concerns about how internal links starting with javascript: are transformed during "preview" of the autotranslated text. However that's unrelated to Youdao, so that does not block you deploying it.

Do you mean this page: https://cxserver.wikimedia.org/v1/ ?
For me it's nearly impossible to find anything on cxserver.wikimedia.org unless you know the address already. I doubt it has much readers.

Yeah. Mostly because I suspect legal/political obstacles are more likely to stop someone then the token, which is trivial to figure out once you watch things in firebug. (The v1 thing follows a standard pattern, so I suspect people will actually read it) However, this isn't a real security issue, so if you think things are fine the way things are, then that's ok too.

If the source wikitext has something like [[Javascript:foo()]], then the translated version will have an a tag with a href="Javascript:foo()". Its unclear if this is exploitable, as click events don't actually navigate you to the link, but its definitely not a good thing. [Note: I did not review that part of the app in general, I was primarily only looking at the Youdao part]

The input to MT engines is not Wikitext. Infact, cxserver does not know about wikitext at all. All the data it accept as input and output is HTML. The mediawiki extension Content translation also works with HTML. The only place it produce wikitext is at the end of workflow, when translator decide to publish an article to a wiki

Sorry I wasn't clear before. To clarify, I mean how cxserver translates parsoid html -> translated html (In general, not Youdao specific). It starts with html in the left-hand column like

<a class="cx-link cx-source-link new" data-linkid="7" href="//en.wikipedia.org/wiki/Javascript:alert(1)" id="mwBA" rel="mw:WikiLink" title="Javascript:alert(1)">

and after you tell it to machine translate, the html in the right-hand column looks like:

<a class="cx-link cx-target-link-unadapted cx-target-link" data-linkid="7" href="Javascript:alert(1)" id="mwBA" rel="mw:WikiLink" title="Javascript:alert(1)">

Note, how before the href attribute is a full url, and then later it becomes a relative url. The relative url isn't safe by itself when a page name contains a colon, as the browser interprets it as a protocol. This is probably a minor issue, as click events are taken over, but nonetheless this is concerning. One possible fix would be prefixing local wiki pages with a /wiki/, so that its href="/wiki/javascript:alert(1)". I'm also concerned a similar thing is possible with the insert external link dialog

Bawolff closed this task as Resolved.Jun 27 2017, 12:35 PM

Split the remaining issue to T168944 since its really unrelated to the main task. Marking this bug as closed.

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".Jul 31 2019, 4:29 PM