Page MenuHomePhabricator

salt-minion processes terminate on deployment sync
Closed, DuplicatePublic

Description

today, right after a deployment sync was triggered, the salt-minions on both deployment servers stopped:

mira:

1555 Dec 28 21:18:36 mira puppet-agent[2411]: (/Stage[main]/Deployment::Deployment_server/Exec[deployment_server_sync_all]) Triggered 'refresh' from 1 events
1556 Dec 28 21:18:38 mira kernel: [2926852.979534] init: salt-minion main process (17705) terminated with status 1

tin:

3395 Dec 28 21:07:18 tin puppet-agent[31334]: (/Stage[main]/Deployment::Deployment_server/Exec[deployment_server_sync_all]) Triggered 'refresh' from 1 events
3396 Dec 28 21:07:20 tin kernel: [28902389.700485] init: salt-minion main process (22731) terminated with status 1

resulting in:

13:15 < icinga-wm> PROBLEM - salt-minion processes on tin is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
13:26 < icinga-wm> PROBLEM - salt-minion processes on mira is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion

When i saw the icinga messages i went to both servers and found salt-minion status to be "stop/waiting" and the log above.

I started the service on both servers and we got recoveries.

Event Timeline

Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn added projects: SRE, Deployments.
Dzahn subscribed.
ArielGlenn triaged this task as High priority.
ArielGlenn moved this task from Backlog to Up Next on the Salt board.

This is the same underlying issue as T124646 and as such I will merge them.