Page MenuHomePhabricator

Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins
Closed, ResolvedPublic

Description

Something (probably https://gerrit.wikimedia.org/r/c/operations/puppet/+/714975) stopped us from installing nrpe on cloud VMs by default.

That's potentially good, except basically every other puppet class expects to install nrpe plugins. Without including the top-level nrpe class we don't have a plugin directory, which means that there are a lot of failures. Here's a sample:

Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_client_bucket20220517-5991-vfrg1p.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/puppet/client_bucket.pp, line: 31)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_client_bucket20220517-5991-vfrg1p.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/puppet/client_bucket.pp, line: 31)
Wrapped exception:
No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_client_bucket20220517-5991-vfrg1p.lock does not exist or is a dangling symbolic link
Error: /Stage[main]/Profile::Puppet::Client_bucket/File[/usr/local/lib/nagios/plugins/check_client_bucket]/ensure: change from 'absent' to 'file' failed: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_client_bucket20220517-5991-vfrg1p.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/puppet/client_bucket.pp, line: 31)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_timedatectl20220517-5991-1xf46wc.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/systemd/timesyncd.pp, line: 23)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_timedatectl20220517-5991-1xf46wc.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/systemd/timesyncd.pp, line: 23)
Wrapped exception:
No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_timedatectl20220517-5991-1xf46wc.lock does not exist or is a dangling symbolic link
Error: /Stage[main]/Profile::Systemd::Timesyncd/File[/usr/local/lib/nagios/plugins/check_timedatectl]/ensure: change from 'absent' to 'file' failed: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_timedatectl20220517-5991-1xf46wc.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/systemd/timesyncd.pp, line: 23)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_microcode20220517-5991-ammoc.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/base/manifests/kernel.pp, line: 104)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_microcode20220517-5991-ammoc.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/base/manifests/kernel.pp, line: 104)
Wrapped exception:
No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_microcode20220517-5991-ammoc.lock does not exist or is a dangling symbolic link
Error: /Stage[main]/Base::Kernel/File[/usr/local/lib/nagios/plugins/check_microcode]/ensure: change from 'absent' to 'file' failed: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_microcode20220517-5991-ammoc.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/base/manifests/kernel.pp, line: 104)
Notice: The LDAP client stack for this host is: sssd/sudo
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: sssd/sudo'
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_journal_pattern20220517-5991-8g6wcp.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/systemd/manifests/init.pp, line: 30)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_journal_pattern20220517-5991-8g6wcp.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/systemd/manifests/init.pp, line: 30)
Wrapped exception:
No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_journal_pattern20220517-5991-8g6wcp.lock does not exist or is a dangling symbolic link
Error: /Stage[main]/Systemd/File[/usr/local/lib/nagios/plugins/check_journal_pattern]/ensure: change from 'absent' to 'file' failed: Could not set 'file' on ensure: No such file or directory - A directory component in /usr/local/lib/nagios/plugins/check_journal_pattern20220517-5991-8g6wcp.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/systemd/manifests/init.pp, line: 30)

This problem didn't appear until we built a new base image for cloud-vps, since the needed directory was baked into the old base image.

Event Timeline

Change 792701 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] profile::wmcs::instance: create nrpe plugin directory

https://gerrit.wikimedia.org/r/792701

Change 792700 had a related patch set uploaded (by Andrew Bogott; author: Majavah):

[operations/puppet@production] nrpe: add nrpe::plugin to only installs scripts to hosts with nrpe

https://gerrit.wikimedia.org/r/792700

My lousy (but likely effective) proposal is https://gerrit.wikimedia.org/r/c/operations/puppet/+/792701

Taavi's much more comprehensive fix is https://gerrit.wikimedia.org/r/c/operations/puppet/+/792700

Taavi's solution is clearly better but requires compliance throughout our codebase, now and forever; we might want to add a lint check if we go that way.

Change 792705 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] base::firewall: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/792705

Change 792721 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-image-create.py: Inject a couple of nagios plugin dirs into our image

https://gerrit.wikimedia.org/r/792721

Change 792700 merged by Jbond:

[operations/puppet@production] nrpe: add nrpe::plugin to only installs scripts to hosts with nrpe

https://gerrit.wikimedia.org/r/792700

Change 792705 merged by David Caro:

[operations/puppet@production] base::firewall: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/792705

Change 792990 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] confd: Use nrpe::plugin

https://gerrit.wikimedia.org/r/792990

Change 792991 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] varnish: move to nagios::plugin

https://gerrit.wikimedia.org/r/792991

Change 792992 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] burrow: move to nrpe::plugin

https://gerrit.wikimedia.org/r/792992

Change 792990 merged by David Caro:

[operations/puppet@production] confd: Use nrpe::plugin

https://gerrit.wikimedia.org/r/792990

Change 792996 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] eventlogging: use nrpe::plugin

https://gerrit.wikimedia.org/r/792996

dcaro changed the task status from Open to In Progress.May 18 2022, 9:25 AM
dcaro claimed this task.
dcaro added a project: User-dcaro.
dcaro moved this task from To refine to Doing on the User-dcaro board.

Change 792998 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] redis: move to nagios::plugin

https://gerrit.wikimedia.org/r/792998

Change 793010 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] gdnsd: use nrpe::plugin

https://gerrit.wikimedia.org/r/793010

Change 793015 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] haproxy: use nrpe::plugin

https://gerrit.wikimedia.org/r/793015

Change 792996 merged by Jbond:

[operations/puppet@production] eventlogging: use nrpe::plugin

https://gerrit.wikimedia.org/r/792996

Change 793034 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] base: remove nrpe old plugin files

https://gerrit.wikimedia.org/r/793034

Change 793010 merged by Jbond:

[operations/puppet@production] gdnsd: use nrpe::plugin

https://gerrit.wikimedia.org/r/793010

Change 793015 merged by Jbond:

[operations/puppet@production] haproxy: use nrpe::plugin

https://gerrit.wikimedia.org/r/793015

Change 793034 merged by Jbond:

[operations/puppet@production] base: remove nrpe old plugin files

https://gerrit.wikimedia.org/r/793034

Change 793037 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] impi: update to use nrpe::plugin

https://gerrit.wikimedia.org/r/793037

Change 793041 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] pybal: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/793041

Change 793044 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] profile: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/793044

Change 793037 merged by Jbond:

[operations/puppet@production] impi: update to use nrpe::plugin

https://gerrit.wikimedia.org/r/793037

Change 793041 merged by Jbond:

[operations/puppet@production] pybal: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/793041

Change 793044 merged by Jbond:

[operations/puppet@production] profile: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/793044

Change 792991 merged by David Caro:

[operations/puppet@production] varnish: move to nrpe::plugin

https://gerrit.wikimedia.org/r/792991

Change 792721 abandoned by Andrew Bogott:

[operations/puppet@production] wmcs-image-create.py: Inject a couple of nagios plugin dirs into our image

Reason:

No longer needed since the real issue was resolved

https://gerrit.wikimedia.org/r/792721

Change 792701 abandoned by Andrew Bogott:

[operations/puppet@production] profile::wmcs::instance: create nrpe plugin directory

Reason:

fixed elsewhere

https://gerrit.wikimedia.org/r/792701

Change 793096 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] mariadb: convert to nrpe::plugin

https://gerrit.wikimedia.org/r/793096

Change 793099 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] monitoring: use nrpe::plugin

https://gerrit.wikimedia.org/r/793099

Change 793102 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] raid: use nrpe::plugin

https://gerrit.wikimedia.org/r/793102

Change 793421 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:wikidough: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/793421

Change 793102 merged by Jbond:

[operations/puppet@production] raid: use nrpe::plugin

https://gerrit.wikimedia.org/r/793102

Change 793099 merged by Filippo Giunchedi:

[operations/puppet@production] monitoring: use nrpe::plugin

https://gerrit.wikimedia.org/r/793099

Change 793096 merged by Jbond:

[operations/puppet@production] mariadb: convert to nrpe::plugin

https://gerrit.wikimedia.org/r/793096

Change 793421 merged by Jbond:

[operations/puppet@production] P:wikidough: migrate to nrpe::plugin

https://gerrit.wikimedia.org/r/793421

Change 793530 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] query_service: check_categories lives in /usr/local/lib now

https://gerrit.wikimedia.org/r/793530

Change 793530 merged by Ryan Kemper:

[operations/puppet@production] query_service: check_categories lives in /usr/local/lib now

https://gerrit.wikimedia.org/r/793530

Change 792992 merged by Jbond:

[operations/puppet@production] burrow: move to nrpe::plugin

https://gerrit.wikimedia.org/r/792992

Change 792998 merged by David Caro:

[operations/puppet@production] redis: move to nrpe::plugin

https://gerrit.wikimedia.org/r/792998