Page MenuHomePhabricator

[Cloud VPS alert][cloudvirt-canary] Puppet failure on (
Closed, ResolvedPublic


Write the description below

Email received with the alert:

Date: Mon, 26 Jul 2021 08:15:03 +0000
From: root <>
Subject: [Cloud VPS alert][cloudvirt-canary] Puppet failure on (

Puppet is having issues on the " (" instance in project
cloudvirt-canary in Wikimedia Cloud VPS.

Puppet is running with failures.

Working Puppet runs are needed to maintain instance security and logins.
As long as Puppet continues to fail, this system is in danger of becoming

You are receiving this email because you are listed as member for the
project that contains this instance.  Please take steps to repair
this instance or contact a Cloud VPS admin for assistance.

You might find some help here:

For further support, visit #wikimedia-cloud on or

Some extra info follows:
---- Last run summary:
changes: {total: 1}
events: {failure: 1, success: 1, total: 2}
resources: {changed: 1, corrective_change: 0, failed: 1, failed_to_restart: 0, out_of_sync: 2,
  restarted: 0, scheduled: 0, skipped: 0, total: 573}
time: {augeas: 0.016067946, catalog_application: 6.311372339725494, config_retrieval: 3.837283907458186,
  convert_catalog: 0.4087558565661311, exec: 0.430169381, fact_generation: 0.8493666732683778,
  file: 2.9797071339999985, file_line: 0.011917505, filebucket: 4.9002e-05, group: 0.000592379,
  host: 0.000401658, last_run: 1627285550, node_retrieval: 0.6049147974699736, notify: 0.005156779,
  package: 1.117646943, plugin_sync: 0.9399111736565828, schedule: 0.000252771, service: 0.7661277689999997,
  tidy: 0.000176439, total: 13.027807069, transaction_evaluation: 6.254771231673658,
  user: 0.000751797}
version: {config: '(32bd2ba79c) Bstorm - toolforge harbor: puppetize experimental
    base server for harbor', puppet: 5.5.22}

---- Exceptions that happened if any:

Event Timeline

Manually ran puppet on the VM, it seems to not have enough memory to run puppet:

dcaro@canary1044-01:~$ sudo -i
root@canary1044-01:~# puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for
Info: Applying configuration version '(9db8aeac15) Muehlenhoff - Fix permissions for /usr/sbin/policy-rc.d'
Notice: The LDAP client stack for this host is: sssd/sudo
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: sssd/sudo'
Error: /Stage[main]/Ldap::Client::Sssd/Package[sudo-ldap]: Could not evaluate: Cannot allocate memory - fork(2)
Error: /Stage[main]/Base::Standard_packages/Base::Service_auto_restart[systemd-journald]/Systemd::Timer::Job[wmf_auto_restart_systemd-journald]/Systemd::Timer[wmf_auto_restart_systemd-journald]/Systemd::Service[wmf_auto_restart_systemd-journald]/Service[wmf_auto_restart_systemd-journald.timer]: Could not evaluate: Cannot allocate memory - fork(2)
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 7.06 seconds
root@canary1044-01:~# free -m
              total        used        free      shared  buff/cache   available
Mem:            484         234         102           5         147         232
Swap:             0           0           0

Killing diamond seems to be enough, maybe we can try to not start it to begin with.

Mentioned in SAL (#wikimedia-cloud) [2021-07-26T13:31:35Z] <dcaro> disabled diamond on the machines (T287350)