Page MenuHomePhabricator

beta-scap-eqiad fails due to ssh-add not finding mwdeploy ssh key
Closed, ResolvedPublic

Description

Since roughly 7:52am UTC, the beta-scap-eqiad job fails.

+ /usr/local/bin/wmf-beta-scap 'beta-scap-eqiad (build #38290)'
Starting ssh-agent
Agent pid 10854
Build step 'Execute shell' marked build as failure

Whereas a passing build has:

+ /usr/local/bin/wmf-beta-scap 'beta-scap-eqiad (build #38286)'
Starting ssh-agent
Agent pid 16027
Identity added: /var/lib/mwdeploy/.ssh/id_rsa (/var/lib/mwdeploy/.ssh/id_rsa)
Started scap: beta-scap-eqiad (build #38286)

The wmf-beta-scap wrapper runs: /usr/local/bin/sudo-withagent mwdeploy /usr/local/bin/scap

I have set +x it and it fails when invoking ssh-add which lookup for a .ssh/id_rsa in the homedir.

Seems some change has been pushed to puppet which changed the $HOME from /var/lib/mwdeploy to /home/mwdeploy hence the failure.

Event Timeline

hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added subscribers: hashar, yuvipanda, mmodell.
greg triaged this task as Unbreak Now! priority.Jan 15 2015, 4:47 PM
greg added a project: Deployments.
greg subscribed.

Adding Deployments

@mmodell: can you take a look at this, please? @yuvipanda can probably help figure out what change broke it from ops side (if that's the case)

btw, I'm turning off the beta code update job because:
A) it's been spamming forever STILL FAILING
B) the wmf labs filesystem outage starts in 1 hour and it'd fail anyways during that

[09:57]  <    bd808>	$HOME for mwdeploy changed to /var/lib/mwdeploy quite a while ago I thought.
[09:58]  <    bd808>	oh the bug is that it changed back?
[09:58]  <    bd808>	I wonder if that has something to do with the ldap changes that Yuvi made?
[10:00]  <    bd808>	the quick hack for that is just to copy the ssh key on deployment-bastion
gerritbot subscribed.

Change 185217 had a related patch set uploaded (by Yuvipanda):
beta: Fix mwdeploy's ssh key path to point to correct path

https://gerrit.wikimedia.org/r/185217

Patch-For-Review

Change 185217 merged by Yuvipanda:
beta: Fix mwdeploy's ssh key path to point to correct path

https://gerrit.wikimedia.org/r/185217

yuvipanda claimed this task.

Unified into /home/mwdeploy now. Puppet is putting ssh keys there. It is shared across instances, but that's ok in this particular case.

I didn't notice this broke scap because I had wmf-insecte on ignore (fixed since), and I didn't notice that there was a .ssh in the homedir because I didn't check all hosts (only a couple) and didn't use -a to ls. Mea culpa.