We'd like to know whether headless Chromium can handle production traffic better than Electron can, in terms of stability and resource consumption. We (Readers Web) hope to switch out Electron for headless Chromium and monitor its performance in production for two weeks.
An initial complication is that the service isn't a Node.js HTTP server that drives an instance of Electron; it's a command line Electron app that runs an HTTP server as well. Fortunately, the the service has a well-defined, small interface:
- GET /pdf?accessKey=<secret>&url=<url>&... – Render the URL as a PDF and return it inline (exposed via RESTBase as /api/rest_v1/page/pdf).
Approaches
1. Add a headless Chromium driver to the service
As noted above, the service is a command line Electron app that runs an HTTP server as well. A lot of the basic infrastructure that's required to serve ~60 k requests/day is already in place, like a fixed size renderer pool with an internal queue.
Pros
- It's deployed!
- The prerequisite infrastructure is already in place (renderer pooling, queuing, timeouts, basic performance monitoring).
- Esoteric rendering errors have already been handled!.
Cons
- Since the service is specific to Electron, there's little separation of concerns between Electron-specific and Node.js HTTP server specific code – hashtag technical debt
- Introducing/rolling back changes is a little trickier.
- The service is very general but is only used in one specific way. Shimming in a single-purpose renderer might be awkward without paring down the service first.
2. Create a new service that drives headless Chromium
Pros
- A neat switcharoo!
- We could even forward X% of requests at the RESTBase level so that we can do science to it.
- Less complexity as it only has to implement the interface.
- Easier to iterate on.
- The service can and should be built atop wikimedia/service-template-node.
Cons
3. Create a new service that's a slave to the existing service
In T176627#3636569, @pmiazga suggests that we can create the new service in #2 but slave it to the existing service so that it does the same work but its output isn't ever consumed by the user.
Pros
- No impact on existing UX (good or bad) while we're trialling the service.
- We can compare the performance of the two services side-by-side.
- Easier to iterate on.
Cons
- Will require additional review from Security and Services in order to deploy it.
- This approach may have less impact but this review still blocks deployment.
- May require changes to the existing service unless we can proxy requests to both services.
Immediate Outcomes
- We choose an approach (using T172815: Improve stability and maintainability of our browser-based PDF render service as a guideline).
- We create a plan to implement this approach.
- This plan will likely involve the creation of subtasks, at which point this task should probably be converted to an epic.
Notes
- The Electron render service source is here: https://github.com/wikimedia/mediawiki-services-electron-render