Page MenuHomePhabricator

scap::target shouldn't allow users to redefine the user's key
Closed, ResolvedPublic

Description

Currently, scap::target accepts the public_key_source parameter with which a declaration of that resource can set an arbitrary key for the deployment user. Conceptually, that means that the user id and key are defined per service which shouldn't be allowed. All of the services running on SCA and SCB will use the same deployment user, which means that if one of them redefines the key for that user, no deploys can happen.

Instead, scap::target should be responsible for managing user keys. The idea is that when defining such a resource, one only specifies the deployment user, and then the key for it is automatically deduced. An obvious way to achieve that is to put user keys in Hiera, and then scap::target can simply look up, say scap::keys::<user-name>, and fail if it doesn't find a key. This would also allow us separate BetaCluster and production keys; for the former, keys can reside in ops/puppet, while we can put the key for the latter in ops/private.

Related Objects

Event Timeline

This might work for the services team but not everyone using scap is a member.

E.g. I deploy Phabricator with Scap but I am not a member of the services deployment group.

Also, we have already moved this configuration into hiera:
rOPUP Wikimedia Puppet/hieradata/common/scap/server.yaml

The keys should probably be moved there as well.

Change 284418 had a related patch set uploaded (by 20after4):
Add keyholder_key and keyholder_pubkey functions

https://gerrit.wikimedia.org/r/284418

Ping? I reported this problem over a month ago and it still hasn't been solved. As a direct consequence of this and the fact that https://gerrit.wikimedia.org/r/#/c/280403/ has been cherry-picked on beta for a long time (and my comments there completely ignored) I cannot deploy services in beta to deployment-sca0x.

At the very minimum, to unblock services deploys, I expect to see ORES moved out of deployment-sca0x ASAP.

Ping? I reported this problem over a month ago and it still hasn't been solved. As a direct consequence of this and the fact that https://gerrit.wikimedia.org/r/#/c/280403/ has been cherry-picked on beta for a long time (and my comments there completely ignored) I cannot deploy services in beta to deployment-sca0x.

We ended up getting side-tracked, trying to make keyholder/key-generation easier overall (https://gerrit.wikimedia.org/r/#/c/284418/) with a patch that ultimately wasn't workable for everyone involved. After the discussion in the Deployment Working Group Meeting on Monday (https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/2016-05-09#Automation_of_key_generation) the path forward here is clearer and we should be able to make a mergable patch in short-order.

At the very minimum, to unblock services deploys, I expect to see ORES moved out of deployment-sca0x ASAP.

@Ladsgroup can you speak to why ORES uses deployment-sca01 as its target? We could setup a new beta instance for that so no one is stepping on anyone else's toes.

I thought that userkeys should be working since I cherry picked rOPUP1b9c846dc69f on beta. Is that incorrect?

Either way, I need to kill that patch and address the problem differently, and I have a better idea about that since we got some clarity during Monday's meeting.

So I intend to work on this today / tomorrow. Apologies for letting this lag.

scap::target no longer allows the key to be specified. It's now all-or-nothing, either it manages the user and the key or it does neither.

Change 284418 had a related patch set uploaded (by 20after4):
scap::target keyholder-managed ssh keys

https://gerrit.wikimedia.org/r/284418

Ping? I reported this problem over a month ago and it still hasn't been solved. As a direct consequence of this and the fact that https://gerrit.wikimedia.org/r/#/c/280403/ has been cherry-picked on beta for a long time (and my comments there completely ignored) I cannot deploy services in beta to deployment-sca0x.

We ended up getting side-tracked, trying to make keyholder/key-generation easier overall (https://gerrit.wikimedia.org/r/#/c/284418/) with a patch that ultimately wasn't workable for everyone involved. After the discussion in the Deployment Working Group Meeting on Monday (https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/2016-05-09#Automation_of_key_generation) the path forward here is clearer and we should be able to make a mergable patch in short-order.

At the very minimum, to unblock services deploys, I expect to see ORES moved out of deployment-sca0x ASAP.

@Ladsgroup can you speak to why ORES uses deployment-sca01 as its target? We could setup a new beta instance for that so no one is stepping on anyone else's toes.

It's because ORES is going to be at SCB in prod so basically I wanted the beta environment as similar as possible to the prod environment. It was Ops' suggestion @akosiaris to be exact.

I'm really saddened to see instead of talking about the underlying issue (keyholder inconsistencies) people say "I expect to see ORES moved out of deployment-sca0x ASAP."

Ping? I reported this problem over a month ago and it still hasn't been solved. As a direct consequence of this and the fact that https://gerrit.wikimedia.org/r/#/c/280403/ has been cherry-picked on beta for a long time (and my comments there completely ignored) I cannot deploy services in beta to deployment-sca0x.

We ended up getting side-tracked, trying to make keyholder/key-generation easier overall (https://gerrit.wikimedia.org/r/#/c/284418/) with a patch that ultimately wasn't workable for everyone involved. After the discussion in the Deployment Working Group Meeting on Monday (https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/2016-05-09#Automation_of_key_generation) the path forward here is clearer and we should be able to make a mergable patch in short-order.

At the very minimum, to unblock services deploys, I expect to see ORES moved out of deployment-sca0x ASAP.

@Ladsgroup can you speak to why ORES uses deployment-sca01 as its target? We could setup a new beta instance for that so no one is stepping on anyone else's toes.

It's because ORES is going to be at SCB in prod so basically I wanted the beta environment as similar as possible to the prod environment. It was Ops' suggestion @akosiaris to be exact.

It makes sense if that is a thing that is being worked on, but blocking other people is simply not fair.

I'm really saddened to see instead of talking about the underlying issue (keyholder inconsistencies) people say "I expect to see ORES moved out of deployment-sca0x ASAP."

And it really makes me angry that you are taking my words out of context. If you read my post carefully you will understand that my goal is not to get rid of ORES but to make progress on this issue and unblock the deployment of other services.

It's because ORES is going to be at SCB in prod so basically I wanted the beta environment as similar as possible to the prod environment. It was Ops' suggestion @akosiaris to be exact.

Thanks for the context. I wasn't aware it was going to be on SCB in prod. I wasn't clear whether using deployment-scb01 was to match prod, or whether we just hit our provisioning limits in beta.

I also need to mention that taking out ORES won't solve the issue, you can't login to sca-01 via keyholder when the key is not the key ORES defines (so basically ORES fixes the keyholder not breaking it). I might be wrong. please correct me if I am.

@Ladsgroup: There was a problem when two different scap::targets defined a key for the same user. The key would be defined by whichever scap::target was evaluated first by puppet, and the order is non-deterministic.

That should be resolved now. As of 40fb2dd5a8c6455b4c32694892713348952aafc0 you should get puppet errors from attempting to redefine the same resource instead of silently choosing one key or the other.

I removed the 'key_name' parameter from scap::target and made the key name resource use the same resource name as the deploy_user.

So the new behavior of scap::target is that it will either manage both the user and the ssh key or it will manage neither. This is based entirely on the manage_user parameter. If it does manage the user then it will define at most one key, which will always be the key that corresponds to the private key held by keyholder. This should be consistent and avoid conflicts.

@mmodell thanks for the explanation. My question is actually about this particular case. I had to redefine the ssh key because keyholder couldn't connect to the sca-01 without redefining the key (Check out logs of #wikimedia-releng) so basically I'm letting others to connect (and they should use the same key I defined in the puppet as well).

Change 284418 abandoned by 20after4:
scap::target keyholder-managed ssh keys

https://gerrit.wikimedia.org/r/284418

I had to redefine the ssh key because keyholder couldn't connect to the sca-01 without redefining the key (Check out logs of #wikimedia-releng) so basically I'm letting others to connect (and they should use the same key I defined in the puppet as well).

I definitely remember this being the case. We need to get deploy-service access to deployment-sca01 without redefining the deploy-service key as in the ORES patch.

I'm working on this now and hopefully the deploy-service user will be able to access needed beta targets shortly.

I've un-cherry-picked the ORES patch from beta puppetmaster for now since it has been the target of much contention. I've left a comment there about an unnecessary scap parameter to scap::target.

tl;dr: Right now deploy-service will work for no one (since I've uncherry-picked the ORES patch), but it will Soon™ work for everyone.

@thcipriani Thanks! I removed that line from my commit. Please re-cherry-pick it once you're done. Thanks.

I've reworked my previous keyholder patch, the whole thing is still a bit of a mess due to so many callers using scap::target differently. I wanted to make it a lot more consistent but this may be the best I can do for now. rOPUP8ab8c6c6fcdb / https://gerrit.wikimedia.org/r/289236

for background see: T133211: Automate the generation deployment keys (keyholder-managed ssh keys)

I've reworked my previous keyholder patch, the whole thing is still a bit of a mess due to so many callers using scap::target differently. I wanted to make it a lot more consistent but this may be the best I can do for now. rOPUP8ab8c6c6fcdb / https://gerrit.wikimedia.org/r/289236

for background see: T133211: Automate the generation deployment keys (keyholder-managed ssh keys)

reassigning task in light of the revived task (TASK HOT POTATO!)


@mobrovac the immediate problem of not being able to deploy on beta should be resolved temporarily until @mmodell's patch merges (when it will be solved fully).

@Ladsgroup I've re-cherry-picked a slightly modified version of your patch that should work for the time being.

For now the deploy-service user uses a key defined at hiera("service::deploy::scap::public_key_file") which is set to puppet:///private/ssh/tin/servicedeploy_rsa.pub. Access to deployment-sca0{1,2} is working for the deploy-service user, which should help keep the lights on in beta for the time being.

@mobrovac the immediate problem of not being able to deploy on beta should be resolved temporarily until @mmodell's patch merges (when it will be solved fully).

Thank you, @thcipriani. All of the services are now deployable in Beta. Very much appreciated.

@Ladsgroup I've re-cherry-picked a slightly modified version of your patch that should work for the time being.

Given all the commotion around scap::tagret here, I would suggest making the ORES patch dependent on this one so as to minimise surprises later on.

For now the deploy-service user uses a key defined at hiera("service::deploy::scap::public_key_file") which is set to puppet:///private/ssh/tin/servicedeploy_rsa.pub. Access to deployment-sca0{1,2} is working for the deploy-service user, which should help keep the lights on in beta for the time being.

FWIW, IMHO this is a the way to go in general since we have settled on the mid-term solution of having one deployment user whose key will be placed in ops/private. labs/private seems to be the natural alternative to it in Beta.

Now I can't deploy Mathoid in beta as it is asking me for a password:

mobrovac@deployment-tin:/srv/deployment/mathoid/deploy$ scap deploy -fv
07:05:00 Started Deploy: mathoid/deploy
07:05:00 Deploying Rev: 81f4ba2620fb05161334385e759cf68fb374d650
07:05:00 Update DEPLOY_HEAD
07:05:00 Creating /mnt/srv/deployment/mathoid/deploy/.git/DEPLOY_HEAD
07:05:00 Update server info
Entering 'src'
07:05:00 
== DEFAULT ==
:* deployment-mathoid.deployment-prep.eqiad.wmflabs
07:05:00 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'mathoid/deploy', '--force', '-g', 'default', 'fetch']
07:05:02 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'mathoid/deploy', '--force', '-g', 'default', 'fetch'] on deployment-mathoid.deployment-prep.eqiad.wmflabs returned [255]: Permission denied (publickey,keyboard-interactive).

mathoid/deploy: fetch stage(s): 100% (ok: 0; fail: 1; left: 0)                  
07:05:02 1 targets had deploy errors
Stage 'fetch' failed on group 'default'. Perform rollback? [y]: n
07:05:09 Finished Deploy: mathoid/deploy (duration: 00m 09s)
mobrovac@deployment-tin:/srv/deployment/mathoid/deploy$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -oBatchMode=yes -l deploy-service deployment-mathoid.deployment-prep.eqiad.wmflabs
Permission denied (publickey,keyboard-interactive).
mobrovac@deployment-tin:/srv/deployment/mathoid/deploy$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service deployment-mathoid.deployment-prep.eqiad.wmflabs
Password:

Any ideas?

Any ideas?

There was a problem with the keys in the secrets module after https://gerrit.wikimedia.org/r/#/c/289236/ was cherry-picked. Should be fixed now.

thcipriani@deployment-tin:~$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service deployment-mathoid.deployment-prep.eqiad.wmflabs
Linux deployment-mathoid 3.19.0-2-amd64 #1 SMP Debian 3.19.3-9 (2016-01-04) x86_64
Debian GNU/Linux 8.4 (jessie)
deployment-mathoid is a Puppet client of deployment-puppetmaster.deployment-prep.eqiad.wmflabs (puppetclient)
deployment-mathoid is a mathoid server (role::mathoid)
The last Puppet run was at Wed May 25 15:41:30 UTC 2016 (10 minutes ago). 
Last login: Wed May 25 15:50:35 2016 from deployment-tin.deployment-prep.eqiad.wmflabs
deploy-service@deployment-mathoid:~$