cumin2001 systemd.timer for regular_backups no longer runs. As reported here: T255132#6650632 It failed once, on the day the puppet code was changed: https://gerrit.wikimedia.org/r/c/operations/puppet/+/643223/4/modules/profile/manifests/mariadb/backup/transfer.pp but it has not been executed (or attempted to execute, according to logs) since then -no regular backup (snapshots) on codfw since Tuesday.
Nov 25 16:41:48 cumin2001 systemd: regular_snapshot.service: Current command vanished from the unit file, execution of the command list won't be resumed.
Funnily, the same change, applied to cumin1001 created no issue. Something special about cumin2001?
Today I disabled and reenabled the timer manually, to see if it helps. I had done multiple systemctl daemon-reload s before. Is there a way to reset all of systemd without restarting?
A systemd.timer that fails to execute and silently refuses to run without any errors is a big blocker to us (backup taking team). Luckly, we have monitoring based on the outcome of the commands, but that may not be the case for other commands. Either systemd.timers have to be as reliable to execute as crons, or puppet code needs changes to handle commands changing (and race conditions?).