Page MenuHomePhabricator

Puppet broken on labstore1004
Closed, ResolvedPublic

Description

This is the current active primary NFS server for Tools and Cloud.

Warning: Found multiple default providers for service: runit, debian; using runit

labstore1004.eqiad.wmnet
Warning: Setting configtimeout is deprecated.
   (at /usr/lib/ruby/vendor_ruby/puppet/settings.rb:1146:in `issue_deprecation_warning')
Info: Using configured environment 'future'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Warning: Found multiple default providers for service: runit, debian; using runit
Info: Caching catalog for labstore1004.eqiad.wmnet
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1513177572'
Error: /Stage[main]/Role::Labs::Nfs::Secondary/Service[drbd]: Could not evaluate: Could not get status for service Service[drbd]: Execution of '/usr/bin/sv status /etc/sv/drbd' returned 1: fail: /etc/sv/drbd: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/drbd' returned 1: fail: /etc/sv/drbd: unable to change to service directory: file does not exist
Error: /Stage[main]/Role::Labs::Nfs::Secondary/Service[nfs-kernel-server]: Could not evaluate: Could not get status for service Service[nfs-kernel-server]: Execution of '/usr/bin/sv status /etc/sv/nfs-kernel-server' returned 1: fail: /etc/sv/nfs-kernel-server: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/nfs-kernel-server' returned 1: fail: /etc/sv/nfs-kernel-server: unable to change to service directory: file does not exist
Error: /Stage[main]/Base::Puppet/Service[puppet]: Could not evaluate: Could not get status for service Service[puppet]: Execution of '/usr/bin/sv status /etc/sv/puppet' returned 1: fail: /etc/sv/puppet: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/puppet' returned 1: fail: /etc/sv/puppet: unable to change to service directory: file does not exist
Error: /Stage[main]/Ldap::Client::Nss/Service[nscd]: Could not evaluate: Could not get status for service Service[nscd]: Execution of '/usr/bin/sv status /etc/sv/nscd' returned 1: fail: /etc/sv/nscd: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/nscd' returned 1: fail: /etc/sv/nscd: unable to change to service directory: file does not exist
Error: /Stage[main]/Ldap::Client::Nss/Service[nslcd]: Could not evaluate: Could not get status for service Service[nslcd]: Execution of '/usr/bin/sv status /etc/sv/nslcd' returned 1: fail: /etc/sv/nslcd: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/nslcd' returned 1: fail: /etc/sv/nslcd: unable to change to service directory: file does not exist
Error: /Stage[main]/Ssh::Server/Service[ssh]: Could not evaluate: Could not get status for service Service[ssh]: Execution of '/usr/bin/sv status /etc/sv/ssh' returned 1: fail: /etc/sv/ssh: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/ssh' returned 1: fail: /etc/sv/ssh: unable to change to service directory: file does not exist
Error: /Stage[main]/Exim4/Service[exim4]: Could not evaluate: Could not get status for service Service[exim4]: Execution of '/usr/bin/sv status /etc/sv/exim4' returned 1: fail: /etc/sv/exim4: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/exim4' returned 1: fail: /etc/sv/exim4: unable to change to service directory: file does not exist
Error: /Stage[main]/Rsyslog/Service[rsyslog]: Could not evaluate: Could not get status for service Service[rsyslog]: Execution of '/usr/bin/sv status /etc/sv/rsyslog' returned 1: fail: /etc/sv/rsyslog: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/rsyslog' returned 1: fail: /etc/sv/rsyslog: unable to change to service directory: file does not exist
Error: /Stage[main]/Diamond/Service[diamond]: Could not evaluate: Could not get status for service Service[diamond]: Execution of '/usr/bin/sv status /etc/sv/diamond' returned 1: fail: /etc/sv/diamond: unable to change to service directory: file does not exist
Wrapped exception:
Execution of '/usr/bin/sv status /etc/sv/diamond' returned 1: fail: /etc/sv/diamond: unable to change to service directory: file does not exist
Notice: /Stage[main]/Labstore::Monitoring::Secondary/Nrpe::Monitor_systemd_unit_state[drbd]/Nrpe::Monitor_service[drbd-state]/Nrpe::Check[check_drbd-state]/File[/etc/nagios/nrpe.d/check_drbd-state.cfg]: Dependency Service[drbd] has failures: true
Notice: /Stage[main]/Nrpe/Base::Service_unit[nagios-nrpe-server]/Service[nagios-nrpe-server]: Dependency Service[drbd] has failures: true
Warning: /Stage[main]/Labstore::Monitoring::Secondary/Nrpe::Monitor_systemd_unit_state[drbd]/Nrpe::Monitor_service[drbd-state]/Nrpe::Check[check_drbd-state]/File[/etc/nagios/nrpe.d/check_drbd-state.cfg]: Skipping because of failed dependencies
Warning: /Stage[main]/Nrpe/Base::Service_unit[nagios-nrpe-server]/Service[nagios-nrpe-server]: Skipping because of failed dependencies
Notice: Applied catalog in 11.22 seconds

Event Timeline

chasemp updated the task description. (Show Details)

root@labstore1004:~# aptitude why runit
i vblade-persist Depends runit (>= 1.8.0-2)

root@labstore1004:~# apt-get remove --purge vblade-persist
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libapr1 libconfuse-common libconfuse0 libpgm-5.1-0 libsodium13 libzmq3 python-dateutil python-jinja2
  python-m2crypto python-markupsafe python-zmq runit vblade
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  vblade-persist*
0 upgraded, 0 newly installed, 1 to remove and 54 not upgraded.
After this operation, 98.3 kB disk space will be freed.
Do you want to continue? [Y/n] Y
(Reading database ... 64253 files and directories currently installed.)
Removing vblade-persist (0.6-2) ...
Processing triggers for man-db (2.7.0.2-5) ...
root@labstore1004:~# apt-get remove --purge runit
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libapr1 libconfuse-common libconfuse0 libpgm-5.1-0 libsodium13 libzmq3 python-dateutil python-jinja2
  python-m2crypto python-markupsafe python-zmq vblade
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  runit*
0 upgraded, 0 newly installed, 1 to remove and 54 not upgraded.
After this operation, 393 kB disk space will be freed.
Do you want to continue? [Y/n] Y
(Reading database ... 64240 files and directories currently installed.)
Removing runit (2.1.2-3) ...
Removing SV inittab entry...
Purging configuration files for runit (2.1.2-3) ...
Processing triggers for man-db (2.7.0.2-5) ...

Mentioned in SAL (#wikimedia-operations) [2017-12-13T15:21:51Z] <chasemp> remove and purge vblade-persist and runit from labstore1004 T182781

fyi @madhuvishy

I don't understand why this behavior started recently as it appears runit would have been installed since sept 6 (thanks @akosiaris )

Very uncomfortable with this but atm puppet is healthy.

root@labstore1005:~# aptitude search runit
p   r-cran-runit                                  - GNU R package providing unit testing framework
p   runit                                         - system-wide service supervision
root@labstore1005:~# aptitude search vblade-persist
p   vblade-persist                                - create/manage supervised AoE exports
chasemp claimed this task.

Mentioned in SAL (#wikimedia-cloud) [2018-01-18T16:11:45Z] <arturo> aborrero@tools-clushmaster-01:~$ sudo aptitude purge vblade vblade-persist runit (for something similar to T182781)