Page MenuHomePhabricator

Prometheus puppet manifest fail on Trusty instance deployment-zotero1 groupadd: failure while writing changes to /etc/group
Closed, ResolvedPublic

Description

On deployment-zotero01

Error: Could not create group prometheus-node-exporter:
  Execution of '/usr/sbin/groupadd prometheus-node-exporter' returned 10:
    groupadd: failure while writing changes to /etc/group

Error: /Stage[main]/Prometheus::Node_exporter/Group[prometheus-node-exporter]/ensure:
  change from absent to present failed:
    Could not create group prometheus-node-exporter:
      Execution of '/usr/sbin/groupadd prometheus-node-exporter' returned 10:
        groupadd: failure while writing changes to /etc/group

Notice: /Stage[main]/Prometheus::Node_exporter/File[/var/lib/prometheus/node.d]:
  Dependency Group[prometheus-node-exporter] has failures: true

Warning: /Stage[main]/Prometheus::Node_exporter/File[/var/lib/prometheus/node.d]:
  Skipping because of failed dependencies

Notice: Finished catalog run in 15.98 seconds

Event Timeline

I remember seeing this before in T144492: Blocked /etc/passwd on sca100[1234] hosts and it is a kernel issue on trusty. The solution on the service cluster was to upgrade the kernel to 4.4 HWE, by installing linux-image-virtual-lts-xenial afaics. I'm going to do that, ideally zotero could run on jessie too but I don't know how much work would that be.

Ah that instances runs 3.13.0-83-generic but 3.13.0.95.103. I am not sure what is zotero01 for though.

Maybe it is safe to reboot it.

Mentioned in SAL (#wikimedia-releng) [2016-09-15T16:45:31Z] <godog> install xenial kernel on deployment-zotero01 and reboot T145793

fixed!

filippo@deployment-zotero01:~$ uname -a
Linux deployment-zotero01 4.4.0-36-generic #55~14.04.1-Ubuntu SMP Fri Aug 12 11:49:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
filippo@deployment-zotero01:~$ uptime
 16:49:11 up 1 min,  1 user,  load average: 0.84, 0.32, 0.12
filippo@deployment-zotero01:~$
fgiunchedi added a subscriber: akosiaris.

Reopening since we're seeing the same on sca2003 / sca2004. In beta this was fixed by installing linux-image-virtual-lts-xenial cc @akosiaris

That part fixed. Sorry, I forgot to do the upgrade to linux-image-virtual-lts-xenial when I installed those images. Puppet is still complaining with a

Error: Could not start Service[prometheus-node-exporter]: Execution of '/sbin/start prometheus-node-exporter' returned 1: 
Error: /Stage[main]/Prometheus::Node_exporter/Base::Service_unit[prometheus-node-exporter]/Service[prometheus-node-exporter]/ensure: change from stopped to running failed: Could not start Service[prometheus-node-exporter]: Execution of '/sbin/start prometheus-node-exporter' returned 1:

but this seems unrelated to the kernel issue

all done! thanks @akosiaris, I've manually reinstalled prometheus-node-exporter and that created the user/group, puppet is happy