[Spike 8hr] How should we limit resources used by chromium render service?
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• bmansurov
	Nov 15 2017, 7:22 PM

Description

We'd like to make sure that the headless chromium instances spun off from the chromium-render service are properly firejailed and CPU limited for security and resource consumption purposes. According to T180626#3772070, firejailing is done automatically.

The task is a blocker for the service to go on production. But it's should be worked on after we're satisfied with the performance test of the service: T178278: Performance test the service.

Closed Questions

How should we manage resource consumption (mem, CPU)? Look into using cpulimit to limit the CPU usage. What about memory usage?

firejail has the facility to limit CPU time and the maximum size of the processes virtual memory, the --rlimit-as and --rlimit-cpu options respectively (see https://firejail.wordpress.com/features-3/man-firejail/). We shouldn't need to worry about limiting CPU time as we've already implemented job timeouts in the service itself.

Related Objects
Search...

Status	Assigned	Task
Resolved	• mobrovac	T159922 pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003
Invalid	None	T172815 Improve stability and maintainability of our browser-based PDF render service
Resolved	ovasileva	T181079 [GOAL] Provide an expanded reading experience by improving the ways that users can download articles of interest for later consumption
Resolved	None	T181084 [EPIC] Deploy the mediawiki-services-chromium-render service (Proton)
Resolved	phuedx	T181118 [EPIC] Build a Chromium-based PDF renderer service
Resolved	phuedx	T180626 [Spike 8hr] How should we limit resources used by chromium render service?
Resolved	phuedx	T178278 Performance test the service

Event Timeline

• bmansurov created this task.Nov 15 2017, 7:22 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 15 2017, 7:22 PM

• bmansurov mentioned this in T178501: Limit resources used by Chromium in order to make the chromium-render service ready for production.Nov 15 2017, 7:22 PM

Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.Nov 15 2017, 7:27 PM

• Pchelolo moved this task from Backlog to watching on the Services board.Nov 16 2017, 12:48 PM

• Pchelolo edited projects, added Services (watching); removed Services.

@mobrovac @Pchelolo how do other services do this? Can you point us to a documentation page where something like this is already implemented? Thanks.

Each service running in production is already properly sandboxed with firejail for security reasons; this is something we provide out of the box. However, given that chromium might take up considerable resources, we should limit its resource usage as well. I would propose, however, to tackle this in a second step after the initial prototype has been built, but before this goes into production.

EDIT: an example launch in production:

/usr/bin/firejail --blacklist=/root --blacklist=/home --caps --seccomp /usr/bin/nodejs src/server.js -c /etc/{service}/config.yaml

phuedx added a parent task: T181118: [EPIC] Build a Chromium-based PDF renderer service.Nov 22 2017, 6:07 AM

For posterity, here's the comment from @GWicke and @mobrovac around overriding puppeteer's spawn function with something that invokes firejail: T172815#3512836

ovasileva moved this task from Triage to Backlog - Q2 on the Proton board.Dec 1 2017, 2:22 PM

• bmansurov renamed this task from Firejail (and cpulimit, if feasable) headless chromium processes to Limit resources used by chromium render service.Dec 6 2017, 7:22 PM

• bmansurov renamed this task from Limit resources used by chromium render service to [Spike] How should we limit resources used by chromium render service?.

• bmansurov updated the task description. (Show Details)

• bmansurov added a parent task: T178278: Performance test the service.

• bmansurov added a project: Spike.

• bmansurov updated the task description. (Show Details)

• bmansurov updated the task description. (Show Details)Dec 6 2017, 7:24 PM

• bmansurov moved this task from Needs Prioritization to Triaged but Future on the Web-Team-Backlog board.

The task is a blocker for the service to go on production. But it's should be worked on after we're satisfied with the performance test of the service: T178278: Performance test the service.

Does this mean that this task should be a subtask of T181084: [EPIC] Deploy the mediawiki-services-chromium-render service (Proton) and T178278: Performance test the service be a subtask of this task?

Yes.

phuedx removed a parent task: T178278: Performance test the service.Dec 7 2017, 11:30 AM

phuedx added a subtask: T178278: Performance test the service.

ovasileva triaged this task as High priority.Dec 7 2017, 3:48 PM

ovasileva moved this task from Triaged but Future to Upcoming on the Web-Team-Backlog board.Dec 12 2017, 7:45 PM

In T180626#3772070, @mobrovac wrote:

I would propose, however, to tackle this in a second step after the initial prototype has been built, but before this goes into production.

@mobrovac: We're going to start looking into this after the performance testing is done. Are there other services that require resource limiting in production that we can take a look at?

We use firejail in production to isolate services, but don't currently use it for resource limiting. One first step could be to use it to limit the overall resources of the services (with all of its sub-processes), which should be straightforward.

ovasileva renamed this task from [Spike] How should we limit resources used by chromium render service? to [Spike 8hr] How should we limit resources used by chromium render service?.Dec 19 2017, 5:33 PM

• bmansurov unsubscribed.Dec 22 2017, 9:47 PM

phuedx closed subtask T178278: Performance test the service as Resolved.Jan 2 2018, 11:33 AM

ovasileva added a project: Readers-Web-Kanbanana-Board-Old.Jan 10 2018, 6:50 PM

ovasileva moved this task from To Do to Needs Design Review on the Readers-Web-Kanbanana-Board-Old board.

ovasileva moved this task from Upcoming to 2017-18 Q2 on the Web-Team-Backlog board.Jan 12 2018, 9:10 PM

ovasileva moved this task from 2017-18 Q2 to 2017-18 Q3 on the Web-Team-Backlog board.Jan 17 2018, 6:43 PM

@phuedx I'm not convinced this is our (reading-web-team) area of expertise to provide such information for Ops. Once it goes live we will not manage the service. We can come up with some ideas but IMHO the Services/Ops/team is going to manage that service should pick this task as they have the experience in running such kind of software on production servers and also they have the "last word".

In T180626#3907622, @pmiazga wrote:

@phuedx I'm not convinced this is our (reading-web-team) area of expertise to provide such information for Ops. Once it goes live we will not manage the service. We can come up with some ideas but IMHO the Services/Ops/team is going to manage that service should pick this task as they have the experience in running such kind of software on production servers and also they have the "last word".

Agreed.

As @mobrovac stated in T180626#3835939, we already use firejail in production to limit overall resource consumption for processes and their sub-processes. firejail ships with options for limiting CPU time for a process and the maximum size of the process's virtual memory, the --rlimit-as and --rlimit-cpu options respectively (see https://firejail.wordpress.com/features-3/man-firejail/). IMO we shouldn't need to worry about limiting CPU time as we've already implemented job timeouts in the service itself.

Knowing that we have the facility to limit memory available to the service with very little effort is enough right now.

See T180626#3936722 and T180626#3936727.

phuedx moved this task from Needs Design Review to Ready for Signoff on the Readers-Web-Kanbanana-Board-Old board.Feb 1 2018, 10:01 AM

[Spike 8hr] How should we limit resources used by chromium render service?Closed, ResolvedPublicActions

Description

Closed Questions

Related ObjectsSearch...

Event Timeline

[Spike 8hr] How should we limit resources used by chromium render service?
Closed, ResolvedPublic
Actions

Related Objects
Search...