Page MenuHomePhabricator

Fix retcode in wmfpuppet Salt module
Closed, DeclinedPublic

Description

In case Puppet exit code is 2 (success with changes) wmfpuppet set the retcode to 0 in the return dictionary but the Job retcode is still the original value:

{'_stamp': '2016-09-09T10:49:36.413191',
 'cmd': '_return',
 'fun': 'wmfpuppet.run',
 'fun_args': [],
 'id': 'tin.eqiad.wmnet',
 'jid': '20160909104809177993',
 'retcode': 2,                 <-----------------------------------------
 'return': {'pid': 12850,
            'retcode': 0       <-----------------------------------------
            'stderr': '',
            'stdout': "Info: Retrieving plugin\nInfo: Loading facts in /var/lib/puppet/lib/facter/initsystem.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/ganeti.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/puppet_config_dir.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/apt.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/lldp.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/physicalcorecount.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/raid.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/puppetmastername.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/root_home.rb\nInfo: Loading facts in /var/lib/puppet/lib/facter/labsprojectfrommetadata.rb\nInfo: Caching catalog for tin.eqiad.wmnet\nInfo: Applying configuration version '1473417904'\nNotice: /Stage[main]/Deployment::Deployment_server/Salt::Grain[deployment_server]/Exec[ensure_deployment_server_true]/returns: executed successfully\nInfo: Salt::Grain[deployment_server]: Scheduling refresh of Exec[deployment_server_sync_all]\nNotice: /Stage[main]/Deployment::Deployment_server/Exec[deployment_server_sync_all]: Triggered 'refresh' from 1 events\nNotice: /Stage[main]/Deployment::Deployment_server/Exec[eventual_consistency_deployment_server_init]/returns: executed successfully\nNotice: Finished catalog run in 62.45 seconds"},
 'success': True}

Event Timeline

When fixing this it is probably worth to handle also the Puppet was already running case that requires a check on the stdout of the job and maybe set a specific exit code so that the caller can decide what to do. Suggested exit code is 99.

Volans moved this task from Backlog to Done on the SRE-tools board.

Salt is now deprecated and we're using Cumin instead. We also have new tools to properly manage puppet runs such as run-puppet-agent.