Page MenuHomePhabricator

ContentTranslation relies on recommendation-api running on Cloud VPS
Open, HighPublic

Description

Current status

Development is blocked, waiting for ??? to resolve T254143: Recommendation api always returns 404 when seed article is not supplied

Original report

Content Security Policy (CSP) is a security layer that limits the cross-site connections. As CSP gets increasingly enforced (T244124), this can be an issue for Content Translation since it currently gets suggestions from the recommendation API running on wmflabs.org.

Currently on Wikipedia, when viewing suggestions, we can see this on browser console:

VM39:1 [Report Only] Refused to connect to 'https://recommend.wmflabs.org/types/translation/v1/articles?source=de&target=gu&seed=&search=morelike&application=CX' because it violates the following Content Security Policy directive: "default-src 'self' data: blob: upload.wikimedia.org https://commons.wikimedia.org meta.wikimedia.org *.wikimedia.org *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikivoyage.org *.mediawiki.org wikimedia.org". Note that 'connect-src' was not explicitly set, so 'default-src' is used as a fallback.

Anticipating that at some point "Report only" will be replaced with a hard block, which will break suggestions if nothing is done, we should ensure that we take measure to keep it working. Possibilities:

  • Make recommendation api to a production service (preferred, but no resources?)
  • Have recommendation api to be included in the whitelist (need to determine how)

Expected outcome

Content Translation gets suggestions from a maintained production service that does not have privacy issues.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 29 2018, 9:29 AM

We have two end-points in production:

  1. https://en.wikipedia.org/api/rest_v1/#!/Recommendation/get_data_recommendation_article_creation_translation_from_lang_seed_article
  2. https://en.wikipedia.org/api/rest_v1/#!/Recommendation/get_data_recommendation_article_creation_morelike_seed_article

The first one works with any language pair, but the second works for languages that have pre-generated data. The second API generates better results and is what we'll be improving in the future.

Would either one work for you? We could support more languages for the 2nd case if you let us know of language pairs you're interested in.

Pginer-WMF triaged this task as Medium priority.Mar 17 2020, 12:05 PM
Pginer-WMF updated the task description. (Show Details)Mar 17 2020, 12:12 PM
Nikerabbit updated the task description. (Show Details)Mar 17 2020, 1:12 PM

Have recommendation api to be included in the whitelist (need to determine how)

The content translation extension can call $this->getOutput()->getCSP()->addDefaultSrc( 'reccomend.wmflabs.org' ). However production services relying on labs stuff is in theory not allowed i think.

We have two end-points in production:

  1. https://en.wikipedia.org/api/rest_v1/#!/Recommendation/get_data_recommendation_article_creation_translation_from_lang_seed_article
  2. https://en.wikipedia.org/api/rest_v1/#!/Recommendation/get_data_recommendation_article_creation_morelike_seed_article

The first one works with any language pair, but the second works for languages that have pre-generated data. The second API generates better results and is what we'll be improving in the future.

Would either one work for you? We could support more languages for the 2nd case if you let us know of language pairs you're interested in.

en.wikipedia.org will of course be in the CSP whitelist, so CSP won't interfere with these.

Nikerabbit renamed this task from Recommendation/Suggestions API CSP warnings to ContentTranslation relies on recommendation-api running on Cloud VPS.May 11 2020, 11:15 AM
sbassett moved this task from Incoming to Watching on the Security-Team board.May 11 2020, 3:10 PM
sbassett removed a project: Security-Team.
Reedy raised the priority of this task from Medium to High.May 11 2020, 3:45 PM

Can we get some action from the owners of this? It's been this way since 2015 now

Why is it using the version of cloud? What is needed to migrate away from this? I'm not clear what it actually does, but @bmansurov suggested we had (mostly) useable endpoints in prod?

If this was for example a gadget or code in Common.js this would've been turned off years ago...

Content Translation uses the recommendation API to surface opportunities for users to translate, which has been useful for users especially as they become closer to their interests.

My understanding is that the labs instance is used because the production does not provide feature parity. In particular, the production version is missing support for multiple "seed articles". Content translation uses those to get recommendations that are similar to the articles the user edited previously.

We were waiting for the production service to support this before migrating, but that has not happened so far. Migrating now means to either serve users with less diverse suggestions or refactor the code to add a workaround for the limitation of having only one seed article.

We have plans to replace the translation dashboard (T243583) where recommendations are provided, and that will include using the production recommendation system. I'd like to understand whether it is worth to fix this on the current dashboard before it is replaced. Does anyone have an estimate of when the cross-site access will be blocked?

kaldari added a subscriber: kaldari.EditedMay 13 2020, 5:40 PM

@Reedy - How close are we to actually enforcing CSP on production wikis? Is there a place we can review all the wmflabs.org endpoints that are still being called from production (maybe even ranked by how frequently they are being called)?

Reedy added a comment.May 13 2020, 6:39 PM

@Reedy - How close are we to actually enforcing CSP on production wikis? Is there a place we can review all the wmflabs.org endpoints that are still being called from production (maybe even ranked by how frequently they are being called)?

I'm not sure how far away we are; there's definitely still stuff to be done before it's enforcing everywhere.

All CSP events are logged to logstash. I don't think anyone has created any sort of CSP dashboard just yet though

See ContentSecurityPolicy and T239077 (among other tasks)

@Nikerabbit - I'm not seeing the "[Report Only] Refused to connect..." warnings when loading suggestions in ContentTranslation. Are you still seeing the warnings? FWIW, I'm not seeing the warnings anywhere anymore. Perhaps someone turned them off.

Krinkle added a comment.EditedMay 14 2020, 1:37 AM

Is there a place we can review all the wmflabs.org endpoints that are still being called from production? […]

All CSP events are logged to logstash.

There is a Logstash dasboard, however this naturally includes all the opted-in activity, as well as activity by browser plugins. Afaik this dashboard mainly exists for investigation during potential malicious activity.

In general there are no violations of it on default traffic by user scripts, apart from the recent one for Wikivoyage's community script (T244691). And of course the current ticket about CX for prod, which I guess people had forgotten about.

There are from time to time small wikis uncovered to still run old versions of scripts, which are updated or disabled whenever they become known. See also T239077#6135626.

[…] I'm not seeing the warnings anywhere anymore. Perhaps someone turned them off.

It was reported and re-confirmed a few days ago at T244952, which led to this.

Change 597531 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Use REST Apis for suggesting articles from recommendation system

https://gerrit.wikimedia.org/r/597531

Nikerabbit updated the task description. (Show Details)Jun 24 2020, 8:39 AM

Change 597531 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Use REST Apis for suggesting articles from recommendation system

https://gerrit.wikimedia.org/r/597531