varnish systemd service reaching TaskMax
Closed, ResolvedPublic

Description

VCL reload is currently broken on cp3032:

Nov  2 09:55:25 cp3032 puppet-agent[39299]: (/Stage[main]/Cacheproxy::Instance_pair/Varnish::Instance[text-backend]/Exec[retry-load-new-vcl-file]/returns) Starting VCC-compiler: fork() failed: Resource temporarily unavailableVCL compilation failed

This is due to the unit reaching TaskMax:

10:00:21 ema@cp3032.esams.wmnet:~
$ sudo systemctl status varnish | grep Tasks
    Tasks: 7372 (limit: 7372)

Interestingly:

  • all other varnish backends have a number of tasks significantly lower than cp3032's, and in any case below 1000. For example, varnish.service on cp3033 currently has 719 task
  • on cp[1075-1090].eqiad.wmnet TaskMax is 41779 instead of 7372, the limit on all other hosts. This is due to DefaultTaskMask being set to 15% of kernel.pid_max, which is 278528 on cp[1075-1090] and 49152 on the others
  • DefaultTaskMask was originally set to 512 by systemd upstream, then changed to 15% of kernel.pid_max: https://github.com/systemd/systemd/issues/3211
  • systemd 232 in stretch ships debian/patches/debian/Revert-core-enable-TasksMax-for-all-services-by-default-a.patch which claims to be reverting https://github.com/systemd/systemd/commit/9ded9cd14 but is in reality just changing systemd-system.conf(5) and a commented value in system.conf (DefaultTasksMax)
ema created this task.Nov 2 2018, 10:51 AM
Restricted Application added a project: Operations. · View Herald TranscriptNov 2 2018, 10:51 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as High priority.Nov 2 2018, 10:51 AM

Mentioned in SAL (#wikimedia-operations) [2018-11-02T10:52:22Z] <ema> restart varnish-be on cp3032 T208574

Change 471244 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: unset TasksMax in systemd units

https://gerrit.wikimedia.org/r/471244

Change 471244 merged by Ema:
[operations/puppet@production] varnish: unset TasksMax in systemd units

https://gerrit.wikimedia.org/r/471244

Change 471249 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: TasksMax comment should be on its own line

https://gerrit.wikimedia.org/r/471249

Change 471249 merged by Ema:
[operations/puppet@production] varnish: TasksMax comment should be on its own line

https://gerrit.wikimedia.org/r/471249

ema closed this task as Resolved.Nov 2 2018, 1:13 PM
ema claimed this task.