Page MenuHomePhabricator

Sanitize MT service output HTML
Closed, ResolvedPublic

Description

As per the security review T144467: Security review for Google MT for Content Translation we need to sanitize the HTML output from MT engines before it is presented back to translator.

The best way to do this is to reuse the sanitizer in parsoid. But currently the sanitizer is token based and cannot handler directly. It is also not a standalone library.

Following tickets exist to address this general sanitizer requirement though.

We need to know if this tickets are actionable in near future. And if not, fill the gap using a generic good enough HTML sanitizer to address security concerns - an option is DOMPurify.

Event Timeline

santhosh triaged this task as Medium priority.

Change 363156 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] MT: Sanitize HTML output from machine translation services

https://gerrit.wikimedia.org/r/363156

Change 363156 merged by jenkins-bot:
[mediawiki/services/cxserver@master] MT: Sanitize HTML output from machine translation services

https://gerrit.wikimedia.org/r/363156