Page MenuHomePhabricator

Security review of mediawiki-services-chromium-render
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project / Description of how the tool will be used at WMF

Services maintain the Electron-based PDF rendering service. Under its current load, the service hangs regularly after consuming a large amount of memory and can also fail to restart gracefully (see T174916, T159922, and T172815 for additional context and discussion).

Electron, a Node.js based desktop application development platform, is based on headless Chromium. By driving headless Chromium directly, rather than via a high-level binding it, we believe that we can make the service simpler and easier to maintain.

We (Readers Web and, eventually, Readers Infra) aim to build a POC replacement for the Electron-based render service, using puppeteer to programmatically control a firejailed headless Chromium process for rendering PDFs. We intend to slave the replacement service to the existing service in order to determine whether it's a suitable replacement.

Dependencies

  1. GoogleChrome/puppeteer
    • GoogleChrome/puppeteer is a high-level JavaScript (targetting Node.js) binding to the Chromium DevTools protocol. It allows a developer to programmatically control headless (or not!) Chromium.

Has this project been reviewed before?

No.

Note well that Services, who are currently responsible for the Electron-based render service, will also be providing concept review for the project.

Working test environment

http://chromium-pdf.wmflabs.org/

Example URLs

  1. http://chromium-pdf.wmflabs.org/en.wikipedia.org/v1/pdf/Berlin/Letter

Post-deployment

Readers Web will be responsible for the service immediately after its deployment and while it's evaluated. If, after evaluation, the headless Chromium based renderer supersedes the current Electron-based renderer, then Readers Infrastructure will take over responsibility.

Contacts

TeamContact
Readers Web@phuedx
Readers Infra@Jhernandez

Event Timeline

phuedx updated the task description. (Show Details)
phuedx changed the task status from Open to Stalled.Oct 9 2017, 4:53 PM

I'm marking this as stalled until we (Reading Web) are ready to proceed with the review.

bmansurov renamed this task from Security review of mediawiki-services-headless-chromium to Security review of mediawiki-services-chromium-render.Oct 10 2017, 2:37 PM
bmansurov updated the task description. (Show Details)
phuedx updated the task description. (Show Details)

@phuedx, do you mind updating the description to note why Electron needs to be replaced and what problems have been observed? Thanks!

@dpatrick: Thanks for the ping! I've added links to the Services team's tickets tracking having to restart the existing service after it's hung. I've also added a little more reasoning to the description.

AFAICT this review has stalled. If so, then should I move this back to Scheduled (hopefully somewhere near the top :] )? /cc @Bawolff

@phuedx @Bawolff is there any progress on this? Has the security review been scheduled? Let me know if I can help/assist somehow.

Bawolff added a subscriber: JBennett.

Hi,

Sorry, I think there was a little mix up here due to Darian's departure. I think what happened was that it was marked as next up for Darian to do, but then he left and we forgot to put it back in the list for someone else to do when he left, and he didn't notice that this was left in limbo.

I'm on vacation right now, please send an email @JBennett to sort out scheduling for this task.

@Bawolff: How's this going? Sorry if I've missed an update elsewhere.

@Bawolff: How's this going? Sorry if I've missed an update elsewhere.

Sorry for the delay, last week had some urgent stuff with the fixcopyright campaign. I am definitely working on it this week.

Approved, looks good.

Thanks again for your patience on this.

Reading Infrastructure is about to take over the service, and one thing I'd like to get a clearer picture on (asking here as I'm sure this came up during the security review) is what level of network isolation the service is working at. Let's say an attacker can put malicious content into the wiki page, and Chromium executes that while rendering the page, and that causes it to send a bunch of requests. Will those requests be routed through Varnish etc. as if they came from the internet? What happens with e.g. .wmnet URLs?

(forgot to ping @Bawolff when asking that)

Reading Infrastructure is about to take over the service, and one thing I'd like to get a clearer picture on (asking here as I'm sure this came up during the security review) is what level of network isolation the service is working at. Let's say an attacker can put malicious content into the wiki page, and Chromium executes that while rendering the page, and that causes it to send a bunch of requests. Will those requests be routed through Varnish etc. as if they came from the internet? What happens with e.g. .wmnet URLs?

This was a little while ago and I don't entirely remember the review in very much detail, but I took another look at the service - and I don't particularly see anything that would stop the service from fetching internal resources like .wmnet urls.

The main good thing on this front is that javascript is not enabled by the service AFAICT. I think the service should probably be using some sort of proxy server that filters out things to inapropriate urls (both internal and external). It should probably set some sort of CSP header on the pages (That's coming to mediawiki in general, but it'd be better if the service explicitly set its own, which could be even more restrictive then MW and be safe from relying on the external stuff setting one properly always).

Additionally, in the config i noticed the following:

puppeteer_options:
  ignoreHTTPSErrors: true
  timeout: 30000
  headless: true
  executablePath: /usr/bin/chromium
  args:
    - '--no-sandbox'
    - '--disable-setuid-sandbox'
    - '--font-rendering-hinting=medium'
    - '--enable-font-antialiasing'
    - '--hide-scrollbars'
    - '--disable-gpu'
    - '--no-first-run'

The --disable-setuid-sandbox, --no-sandbox, ignoreHTTPSErrors: true don't exactly sound reassuring at first glance. I'm not that familiar with chrome sandboxing, so maybe there is legit reasons to do this, but if so it should have some explanatory comments.