Page MenuHomePhabricator

[Spike 8hr] How should we limit resources used by chromium render service?
Closed, ResolvedPublic

Description

We'd like to make sure that the headless chromium instances spun off from the chromium-render service are properly firejailed and CPU limited for security and resource consumption purposes. According to T180626#3772070, firejailing is done automatically.

The task is a blocker for the service to go on production. But it's should be worked on after we're satisfied with the performance test of the service: T178278: Performance test the service.

Closed Questions

  • How should we manage resource consumption (mem, CPU)? Look into using cpulimit to limit the CPU usage. What about memory usage?

firejail has the facility to limit CPU time and the maximum size of the processes virtual memory, the --rlimit-as and --rlimit-cpu options respectively (see https://firejail.wordpress.com/features-3/man-firejail/). We shouldn't need to worry about limiting CPU time as we've already implemented job timeouts in the service itself.

Event Timeline

@mobrovac @Pchelolo how do other services do this? Can you point us to a documentation page where something like this is already implemented? Thanks.

Each service running in production is already properly sandboxed with firejail for security reasons; this is something we provide out of the box. However, given that chromium might take up considerable resources, we should limit its resource usage as well. I would propose, however, to tackle this in a second step after the initial prototype has been built, but before this goes into production.

EDIT: an example launch in production:

/usr/bin/firejail --blacklist=/root --blacklist=/home --caps --seccomp /usr/bin/nodejs src/server.js -c /etc/{service}/config.yaml

For posterity, here's the comment from @GWicke and @mobrovac around overriding puppeteer's spawn function with something that invokes firejail: T172815#3512836

bmansurov renamed this task from Firejail (and cpulimit, if feasable) headless chromium processes to Limit resources used by chromium render service.Dec 6 2017, 7:22 PM
bmansurov renamed this task from Limit resources used by chromium render service to [Spike] How should we limit resources used by chromium render service?.
bmansurov updated the task description. (Show Details)
bmansurov added a project: Spike.
bmansurov updated the task description. (Show Details)

The task is a blocker for the service to go on production. But it's should be worked on after we're satisfied with the performance test of the service: T178278: Performance test the service.

Does this mean that this task should be a subtask of T181084: [EPIC] Deploy the mediawiki-services-chromium-render service (Proton) and T178278: Performance test the service be a subtask of this task?

I would propose, however, to tackle this in a second step after the initial prototype has been built, but before this goes into production.

@mobrovac: We're going to start looking into this after the performance testing is done. Are there other services that require resource limiting in production that we can take a look at?

We use firejail in production to isolate services, but don't currently use it for resource limiting. One first step could be to use it to limit the overall resources of the services (with all of its sub-processes), which should be straightforward.

ovasileva renamed this task from [Spike] How should we limit resources used by chromium render service? to [Spike 8hr] How should we limit resources used by chromium render service?.Dec 19 2017, 5:33 PM

@phuedx I'm not convinced this is our (reading-web-team) area of expertise to provide such information for Ops. Once it goes live we will not manage the service. We can come up with some ideas but IMHO the Services/Ops/team is going to manage that service should pick this task as they have the experience in running such kind of software on production servers and also they have the "last word".

@phuedx I'm not convinced this is our (reading-web-team) area of expertise to provide such information for Ops. Once it goes live we will not manage the service. We can come up with some ideas but IMHO the Services/Ops/team is going to manage that service should pick this task as they have the experience in running such kind of software on production servers and also they have the "last word".

Agreed.

As @mobrovac stated in T180626#3835939, we already use firejail in production to limit overall resource consumption for processes and their sub-processes. firejail ships with options for limiting CPU time for a process and the maximum size of the process's virtual memory, the --rlimit-as and --rlimit-cpu options respectively (see https://firejail.wordpress.com/features-3/man-firejail/). IMO we shouldn't need to worry about limiting CPU time as we've already implemented job timeouts in the service itself.

Knowing that we have the facility to limit memory available to the service with very little effort is enough right now.

phuedx claimed this task.
phuedx updated the task description. (Show Details)