I noticed the [[ https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/ | beta-scap-eqiad ]] Jenkins job and update-db job borked over night (around midnight UTC) due to /var on deployment-bastion.eqiad.wmflabs being full. Clearing /var/ resumed them.
@KartikMistry filled T95539 the deployment-salt puppet master had a stall copy of operations/puppet.git . Turns out some cherry pick have been made ~ 9 days ago and prevented the magic script to auto rebase the repository. I have fixed it.
Since then, the [[ https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/ | beta-scap-eqiad ]] job is failing because scap cant ssh to the MediaWiki instances:
15:18:26 Started sync-apaches
['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n'] on deployment-videoscaler01.eqiad.wmflabs returned :
Warning: Permanently added 'deployment-videoscaler01.eqiad.wmflabs,10.68.16.211' (ECDSA) to the list of known hosts.
Permission denied (publickey).
Which is probably related to 9+ days of puppet changes being applied to the instances :(
I attempted debugging it but could not find anything.