Improve stability and maintainability of our browser-based PDF render service
Open, HighPublic

Description

Following up on T134205, we chose Electron via the electron-render-service project as a browser-based PDF render solution. While this has worked well overall, we have also found some stability issues (see T159922). The stability issues seem to be related to the need for a virtual X server using xpra.

Now that the Reading team has decided on using generic browser-based rendering as part of their PDF render pipeline, we should look into improving the stability of the browser-based PDF render solution used.

Options

Switch to headless Chrome

In T166188#3325004 and following comments, we discussed using the now-landed native headless Chromium mode for PDF printing. Using Chromium 59 or larger, PDFs can be printed from the commandline using an invocation like this: chromium --disable-gpu --headless --print-to-pdf=obama.pdf https://en.wikipedia.org/wiki/Barack_Obama. Since this implementation is headless, no xpra is needed, which should avoid the race conditions we encountered with electron. For easy launching of Chrome and Chromium from node.js, there is https://github.com/GoogleChrome/lighthouse/tree/master/chrome-launcher.

In order to expose a secure and reliable service, we would probably want to:

  • Start one Chrome instance per request. Startup overheads seem to be around 100ms, which is not too critical given that median PDF render times are around 1s. A clean instance per request allows for reliable resource limiting.
  • Limit resources used & accessible to a given Chrome process:
    • Memory
    • Wall clock time
    • Ideally, no writing to disk
    • Ideally, run in firejail with all non-essential features disallowed.
  • Stream out the returned PDF.
Pros
  • Get to use regular Chromium package instead of binary electron distribution.
  • Less complex.
  • Chrome process per request offers better isolation.
Cons
  • Need to write basic service wrapper.

Resolve xpra race condition

Idea: Start xpra as a separate systemd service, and make the pdfrender service depend on that.

Pros
  • Relatively simple
  • No need to create service, or change consumers.
Cons
  • Continued dependency on Electron, which is less maintained than Chromium.
  • Reliability in the face of Electron failures (OOM) likely still less than perfect.

Related Objects

GWicke created this task.Aug 8 2017, 8:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 8 2017, 8:35 PM
GWicke triaged this task as High priority.Aug 8 2017, 8:37 PM
GWicke updated the task description. (Show Details)Aug 9 2017, 2:36 PM
GWicke added a comment.Aug 9 2017, 3:56 PM

There are quite a few headless chrome wrappers listed as dependents of chrome-launcher: https://www.npmjs.com/browse/depended/chrome-launcher

Chrome-launcher supports overriding the spawn() function used, which we can use to add firejail or other limiting.

+1 on this, but let's wait the (final) outcome related to T150871: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service before moving on this.

Chrome-launcher supports overriding the spawn() function used, which we can use to add firejail or other limiting.

IMHO, it would be better to launch headless Chrome processes from a service-runner-enabled service, since that way we get automatic firejailing.

IMHO, it would be better to launch headless Chrome processes from a service-runner-enabled service, since that way we get automatic firejailing.

Yeah, that too, but if we wanted to limit the chromium sub-process further, then we can by overloading spawn.

The latest in headless Chrome wrapping technologies seems to be https://github.com/GoogleChrome/puppeteer. Example code for printing a page to PDF:

const puppeteer = require('puppeteer');

(async() => {

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://en.wikipedia.org/wiki/Barack_Obama', {waitUntil: 'networkidle'});
const pdfBuffer = await page.pdf({format: 'A4'});

browser.close();
})();

Coming directly from the Chrome devs, this is definitely promissing, but the caveat here is that it requires Node v7.x+. This might not prove to be such a big limitation given that v8.x will soon gain LTS status.

The Electron render service currently requires manual attention every few days, so we should address the reliability issues sooner rather than later.

Within the services team, we are currently leaning towards migrating to headless Chrome. While doing so looks relatively straightforward on the technical side & something we should be able to slot in, we also need to consider longer term ownership of this service. With the renewed focus on PDF generation and offline, it might make sense for the reading services team to take on ownership longer term.

@ovasileva @phuedx @faidon, what are your thoughts on a) longer term service ownership, and b) options discussed in this task?

I don't really mind who owns the service (Services or Readers), as long as it's owned by someone :)

Regarding the options, to me it sounds like headless Chrome is the more future proof, more secure and probably more stable option. I don't even consider firejailing Chrome much of a priority, as an unmodified Chrome is already heavily sandboxed and probably more securely coded than most of our deployed codebase :)

That said, I'm comfortable with you making that choice too -- if you think that we need to buy some time (e.g. until Node.js 8 becomes LTS) and that you can make the existing service work reliably, then that's fine too.

Tgr added a subscriber: Tgr.Aug 31 2017, 4:57 PM
Joe added a subscriber: Joe.Sep 6 2017, 9:30 AM
Envlh added a subscriber: Envlh.Sep 12 2017, 8:41 PM
Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2017-09-13T17:55:05Z] <gwicke> rolling restart of pdfrender service in equiad after hang T174916 T172815

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

A brief update:

Readers Web are currently building a Chromium-based PDF render service, which is, essentially, a wrapper around the puppeteer library. Fortunately, v0.10.0 of the library introduced compatibility with Node v6+. There's been a lot of discussion about how exactly we can deploy this service as, by default, the puppeteer library downloads a specific version of the Chromium binary to drive. However, we've investigated and concluded that the library can drive an up-to-date packaged version of Chromium, which is available on Debian Stretch (see T180037: [Spike] Can the new render service run on Debian Stretch? for more detail).

Prior to deploying the service, we'll be performance testing it with guidance from Services. If all goes well, then we intend to deploy it alongside the current service and see how it performs. If we're all happy, then we'll switch out the new service for the old one; if not, then we'll circle back to this task.