I saw this on IRC:
12:29 <+icinga-wm> PROBLEM - puppet last run on cloudvirtan1002 is CRITICAL: CRITICAL: Puppet has 6 failures. Last run 6 minutes ago with 6 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Service[rsyslog],Exec[x509-bundle labvirt-star.eqiad.wmnet-chained],Exec[x509-bundle labvirt-star.eqiad.wmnet-chain]
And checked:
aborrero@cloudvirtan1002:~ $ sudo puppet agent -t -v Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Info: Caching catalog for cloudvirtan1002.eqiad.wmnet Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files Info: Applying configuration version '1550749226' Error: /Stage[main]/Main/Node[__node_regexp__cloudvirtan1001-5.eqiad.wmnet]/Interface::Add_ip6_mapped[main]/Exec[eth0_v6_token]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Rsyslog/Service[rsyslog]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Openstack::Nova::Compute::Service/Sslcert::Certificate[labvirt-star.eqiad.wmnet]/Sslcert::Chainedcert[labvirt-star.eqiad.wmnet]/Exec[x509-bundle labvirt-star.eqiad.wmnet-chained]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Openstack::Nova::Compute::Service/Sslcert::Certificate[labvirt-star.eqiad.wmnet]/Sslcert::Chainedcert[labvirt-star.eqiad.wmnet]/Exec[x509-bundle labvirt-star.eqiad.wmnet-chain]: Could not evaluate: Cannot allocate memory - fork(2) [...] Error: /Stage[main]/Nrpe/Base::Service_unit[nagios-nrpe-server]/Service[nagios-nrpe-server]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Main/Node[__node_regexp__cloudvirtan1001-5.eqiad.wmnet]/Interface::Add_ip6_mapped[main]/Interface::Ip[main]/Exec[ip addr add 2620:0:861:118:10:64:20:45/64 dev eth0]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Admin/Admin::Groupmembers[absent]/Exec[absent_ensure_members]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Admin/Admin::Groupmembers[ops]/Exec[ops_ensure_members]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Admin/Admin::Groupmembers[wikidev]/Exec[wikidev_ensure_members]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Admin/Admin::Groupmembers[ops-adm-group]/Exec[adm_ensure_members]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Admin/Admin::Groupmembers[wmcs-roots]/Exec[wmcs-roots_ensure_members]: Could not evaluate: Cannot allocate memory - fork(2) [...] Notice: Applied catalog in 13.42 seconds
However, the server has almost 1GB RAM available yet:
aborrero@cloudvirtan1002:~$ free -m total used free shared buffers cached Mem: 128847 128085 762 1825 17 1974 -/+ buffers/cache: 126092 2754 Swap: 0 0 0
I didn't see anything relevant in dmesg or syslog.
If they are running out of memory, we may consider using KSM https://en.wikipedia.org/wiki/Kernel_same-page_merging (after security considerations are checked).