Page MenuHomePhabricator

[tools-sgeweblight-10-25] puppet throws segmentation fault
Closed, ResolvedPublic

Description

There was an alert of puppet failing on the host, and after sshing and running it manually I got:

root@tools-sgeweblight-10-25:~# run-puppet-agent
Ignoring stale puppet agent lock for pid 6146
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
/usr/lib/ruby/2.5.0/psych/visitors/emitter.rb:42: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux-gnu]

Event Timeline

dcaro triaged this task as High priority.Dec 5 2023, 10:59 AM
dcaro created this task.
dcaro updated the task description. (Show Details)

A second run fails with failed to allocate memory (common issue):

root@tools-sgeweblight-10-25:~# puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Error: Facter: error while resolving custom facts in /var/lib/puppet/lib/facter/spark_version.rb: Cannot allocate memory - /usr/bin/dpkg-query -W -f='${Version}' spark2 2>/dev/null | /usr/bin/awk -F '-' '{print $1}'
Error: Facter: error while resolving custom facts in /var/lib/puppet/lib/facter/nfscommon_version.rb: Cannot allocate memory - /usr/bin/dpkg-query -W -f='${Version}' nfs-common 2>/dev/null | /usr/bin/awk -F '-' '{print $1}'
Error: Facter: error while resolving custom fact "ipaddress": no implicit conversion of nil into String
Error: Facter: error while resolving custom fact "ipaddress6": undefined method `gsub' for nil:NilClass
Error: Facter: error while resolving custom fact "networking": 765: unexpected token at ''
Error: Facter: error while resolving custom fact "lldp": undefined method `[]' for nil:NilClass
Error: Facter: error while resolving custom fact "net_driver": undefined method `[]' for nil:NilClass
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Site (undefined) not found in cluster misc (file: /etc/puppet/modules/profile/manifests/base.pp, line: 34, column: 9) on node tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
[FATAL] failed to allocate memory

Mentioned in SAL (#wikimedia-cloud) [2023-12-05T11:01:48Z] <dcaro> rebooting tools-sgeweblight-10-25 due to memory allocation issue (T352753)

This is gone after rebooting, so I'll close as "memory exhaustion made puppet/ruby throw segmentation fault somehow".