Page MenuHomePhabricator

beta-scap-eqiad no more run due to ssh Permission denied
Closed, ResolvedPublic

Description

I noticed the beta-scap-eqiad Jenkins job and update-db job borked over night (around midnight UTC) due to /var on deployment-bastion.eqiad.wmflabs being full. Clearing /var/ resumed them.

@KartikMistry filled T95539 the deployment-salt puppet master had a stall copy of operations/puppet.git . Turns out some cherry pick have been made ~ 9 days ago and prevented the magic script to auto rebase the repository. I have fixed it.

Since then, the beta-scap-eqiad job is failing because scap cant ssh to the MediaWiki instances:

15:18:26 Started sync-apaches
['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n'] on deployment-videoscaler01.eqiad.wmflabs returned [255]:
  Warning: Permanently added 'deployment-videoscaler01.eqiad.wmflabs,10.68.16.211' (ECDSA) to the list of known hosts.
  Permission denied (publickey).
...

Which is probably related to 9+ days of puppet changes being applied to the instances :(

I attempted debugging it but could not find anything.

Event Timeline

hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added subscribers: hashar, KartikMistry.
greg triaged this task as Unbreak Now! priority.Apr 9 2015, 3:30 PM
greg set Security to None.