Page MenuHomePhabricator

Deploy analytics-refinery with scap3
Closed, ResolvedPublic13 Estimated Story Points

Related Objects

Event Timeline

Not sure if this blocks T109926; analytics/refinery is not a Services team owned thang. But, we should still do this!

Milimetric triaged this task as Medium priority.Mar 14 2016, 4:16 PM
Milimetric moved this task from Incoming to Analytics Query Service on the Analytics board.
Milimetric moved this task from Backlog (Later) to Dashiki on the Analytics board.Jun 2 2016, 5:18 PM
Milimetric moved this task from Dashiki to Operational Excellence Future on the Analytics board.
Milimetric moved this task from Operational Excellence Future to Dashiki on the Analytics board.
Milimetric added a subscriber: Milimetric.

@greg does this block you all for this quarter? We want to get it done but we can wait until next quarter if it's just up to us.

greg added a comment.Jun 2 2016, 5:36 PM

Our quarterly goal was to get all services migrated this quarter; I know it's a tough thing that means work for others to meet our goal.

We can help, of course, and would prefer sooner than later, but no one will die if it slips.

FWIW, the migration process has become more streamlined than it was when the previous analytics service moved. Thanks in no small part to @Ottomata's puppet work (which we've also built on).

Much of the guide here is likely applicable: https://wikitech.wikimedia.org/wiki/Services/Scap_Migration

The puppet tl;dr is: add a scap::target definition to your targets, and add your service to hieradata/common/scap/server.yaml.

The more difficult part is creating the scap.cfg that is tailored to your needs, but RelEng is available to help with that where possible (either on IRC or on this ticket). We definitely have feature parity(++) with trebuchet, so if you're using that for deployment, migration should be fairly simple.

Ottomata assigned this task to elukey.Jun 7 2016, 3:51 PM
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.
elukey added a comment.Jul 8 2016, 1:03 PM

Asking some trivial questions since I am a bit ignorant about scap :)

The analytics refinery (https://phabricator.wikimedia.org/diffusion/ANRE/browse/master/) is not a service but a simple git repo that needs git-fat to be deployed, so probably I'd only need to configure the "scap" directory in the repo itself?

Moreover, in modules/role/manifests/analytics_cluster/refinery.pp I can see the following snippet:

# analytics/refinery will deployed to this node.
package { 'analytics/refinery':
    provider => 'trebuchet',
}

That can probably be replaced with

package { 'git-fat':
    ensure => present,
}

Any suggestion? Thanks!

The analytics refinery (https://phabricator.wikimedia.org/diffusion/ANRE/browse/master/) is not a service but a simple git repo that needs git-fat to be deployed, so probably I'd only need to configure the "scap" directory in the repo itself?

That seems correct to me.

Moreover, in modules/role/manifests/analytics_cluster/refinery.pp I can see the following snippet:

# analytics/refinery will deployed to this node.
package { 'analytics/refinery':
    provider => 'trebuchet',
}

That can probably be replaced with

package { 'git-fat':
    ensure => present,
}

Any suggestion? Thanks!

You will likely also need to add a scap::target definition. It will be something like:

scap::target { 'analytics/refinery':
    deploy_user  => 'deploy-service',
}

scap::target should contain most of the logic needed to move a service to using the scap provider.

There are a few other puppet tweaks you'll need to make as well, those are mostly outlined on the Wikitech migration page: https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide

Change 298967 had a related patch set uploaded (by Elukey):
Add the deploy-analytics user/group to the refinery role

https://gerrit.wikimedia.org/r/298967

Change 298967 merged by Elukey:
Add the deploy-analytics user/group to the refinery role

https://gerrit.wikimedia.org/r/298967

@thcipriani thanks! I have another doubt: we'd want to use the deploy-analytics user to perform the deployment but I am not sure what is the procedure to allow ssh connections between tin and the target host (in this case analytics1027.eqiad.wmnet). Do we need to manually add ssh keys to the key holder?

elukey added a comment.EditedJul 15 2016, 3:09 PM

Just generated and added the pub/priv key to the private repo.

Suggestion: might be good to have a reference in https://doc.wikimedia.org/mw-tools-scap/scap3/ssh-access.html or https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide since it is not that easy to find people that knows where to put the ssh keys :)

Suggestion: might be good to have a reference in https://doc.wikimedia.org/mw-tools-scap/scap3/ssh-access.html or https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide since it is not that easy to find people that knows where to put the ssh keys :)

That is a great suggestion. I'll make that update.

I've been OOO for the past couple days so I missed your patch. One note is that scap::target can also manage/create the deploy-analytics user/group/key for you: https://github.com/wikimedia/operations-puppet/blob/production/modules/scap/manifests/target.pp#L62-L88

This is likely also something I'll put in the docs.

Change 299713 had a related patch set uploaded (by Elukey):
Remove analytics-deploy user/group since they will be created by scap:target.

https://gerrit.wikimedia.org/r/299713

Change 299713 merged by Elukey:
Remove analytics-deploy user/group since they will be created by scap:target.

https://gerrit.wikimedia.org/r/299713

Change 299714 had a related patch set uploaded (by Elukey):
Initial basic configuration for the Refinery Scap repository.

https://gerrit.wikimedia.org/r/299714

Change 299719 had a related patch set uploaded (by Elukey):
Move the Analytics Refinery role to scap3

https://gerrit.wikimedia.org/r/299719

Summary:

  1. External scap repository config: https://gerrit.wikimedia.org/r/299714
  2. puppet changes: https://gerrit.wikimedia.org/r/#/c/299719

@thcipriani do you mind to check if they make sense?

Deleted the keys from the keyholder private repo, I need to pass protect them properly as stated in https://wikitech.wikimedia.org/wiki/Keyholder

Summary:

  1. External scap repository config: https://gerrit.wikimedia.org/r/299714
  2. puppet changes: https://gerrit.wikimedia.org/r/#/c/299719

@thcipriani do you mind to check if they make sense?

Changes look good to me and seem to make sense in the context of the analytics/refinery repo.

I also updated the docs on Wikitech https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide with many of your suggestions from this task. Thanks for pointing out where docs were lacking :)

elukey added a subscriber: Dzahn.Jul 20 2016, 2:50 PM

Summary:

  1. External scap repository config: https://gerrit.wikimedia.org/r/299714
  2. puppet changes: https://gerrit.wikimedia.org/r/#/c/299719

@thcipriani do you mind to check if they make sense?

Changes look good to me and seem to make sense in the context of the analytics/refinery repo.

Thanks for the review!

I also updated the docs on Wikitech https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide with many of your suggestions from this task. Thanks for pointing out where docs were lacking :)

Updates look awesome! One more suggestion if I may: I found the possibility to add the separate scap repository very handy, a good separation of concern between deployment config and code. It would be nice to suggest this way to people as it seems the cleanest one, but I don't know all the use cases to have an authoritative opinion :)

I tried to add a password to the pwstore but I am blocked since there seems to be an issue with expired keys, so I'll follow up with @Dzahn to figure out what is happening.

Ottomata added a subscriber: Joe.Jul 21 2016, 1:43 PM

The problem with pwstore is this same as I had here: https://phabricator.wikimedia.org/T132177#2341946

The last I checked up on it, @Joe's key was expired.

Nuria set the point value for this task to 13.Jul 21 2016, 4:56 PM
Nuria removed a project: Analytics.

I can see other keys expired with gpg --list-keys.

@mark, @yuvipanda, @chasemp: would you mind to double check your gpg keys and see if it is the same problem as https://phabricator.wikimedia.org/T132177#2341946 ? Not a huge priority but we can't add more secrets to the pwstore at the moment (at least this is my understanding).

Cc: @MoritzMuehlenhoff

Thanks!

elukey moved this task from In Progress to Paused on the Analytics-Kanban board.Jul 26 2016, 1:17 PM

Change 299714 merged by Elukey:
Initial basic configuration for the Refinery Scap repository.

https://gerrit.wikimedia.org/r/299714

@thcipriani sorry to bother you again, but we were wondering what would be the best way to migrate the repo. Afaiu merging my puppet patch will trigger changes on tin and scap targets repos. If a failure happens then we are stuck in Prod with a misconfiguration.. I thought about testing this change in Labs but it seems a bit overkill, so I was wondering if you had any best practices to follow.

@thcipriani sorry to bother you again, but we were wondering what would be the best way to migrate the repo. Afaiu merging my puppet patch will trigger changes on tin and scap targets repos. If a failure happens then we are stuck in Prod with a misconfiguration.. I thought about testing this change in Labs but it seems a bit overkill, so I was wondering if you had any best practices to follow.

No bother at all :)

What we've done in the past (when possible) is to:

  1. Merge the puppet patches
  2. Run puppet on tin. There are no actual changes that happen on tin—mostly the puppet changes here make scap3 rather than trebuchet responsible for the the repo under /srv/deployment/
  3. Depool the canary host if possible so that any errors do not impact production traffic.
  4. Run scap deploy --init from /srv/deployment/analytics/refinery on tin (this step shouldn't be hyper-critical—it generates .git/DEPLOY_HEAD which is the configuration that is used for remote hosts. This file is generated whenever you run scap deploy as well, but pre-target-puppet-run it's probably a nice-to-have thing)
  5. Run puppet on the targets. The main thing that changes here is that the ownership of /srv/deployment/analytics/refinery changes to your deployment user. It also sets up your ssh user public keys.
  6. Make sure to have a terminal window open other than the one used for deployment on tin and run: scap deploy-log -v from /srv/deployment/analytics/refinery in that window. That will give you the most verbose logging from any remote host on which a deployment is run.
  7. Running scap deploy from tin will first deploy only to the canary and will then prompt you to continue, so it should be a relatively safe procedure. That is, it won't deploy to the other hosts until you type 'Y'.

If deployment fails, it should hopefully be obvious why it fails (there is some fairly verbose log output in scap deploy-log -v) so hopefully the problem can be fixed quickly. If it does fail, you can Ctrl-C the scap process and only the canary will have been deployed to.

To revert back to using trebuchet, it should be as simple as changing the ownership recursively of /srv/deployment/analytics on the canary, removing /srv/deployment/analytics/refinery-cache and /srv/deployment/analytics/refinery, reverting the puppet patches then running:

sudo salt-call deploy.fetch 'analytics/refinery'
sudo salt-call deploy.checkout 'analytics/refinery'

on the canary.

Sorry for the wall of text, but it is to say: you can get back to your previous state mostly easily, and the deploys in which I have taken part are fairly low-risk and/or the risks can be easily mitigated not to cause production impact.

Sorry for the wall of text, but it is to say: you can get back to your previous state mostly easily, and the deploys in which I have taken part are fairly low-risk and/or the risks can be easily mitigated not to cause production impact.

I have to say thank you 100 times, this is exactly what I needed, thanks for the very precise list!

  1. Run puppet on the targets. The main thing that changes here is that the ownership of /srv/deployment/analytics/refinery changes to your deployment user. It also sets up your ssh user public keys ​.​

​Hm, will this fix all of the permissions and ownership recursively? It
seems to me like it might be safer move/delete this directory on the
target, and then deploy with snap.

​Hm, will this fix all of the permissions and ownership recursively? It
seems to me like it might be safer move/delete this directory on the
target, and then deploy with snap.

It does change permissions recursively if the target dir is owned by root (euid of puppet-agent): https://github.com/wikimedia/operations-puppet/blob/production/modules/scap/manifests/target.pp#L124-L133

You could move/remove the directory on the targets pre-scap run, but in that case puppet will run scap for you using the scap3 provider: https://github.com/wikimedia/operations-puppet/blob/production/modules/scap/lib/puppet/provider/package/scap3.rb

elukey added a comment.Aug 1 2016, 3:03 PM

Update: this task is blocked until the pwstore vault will be usable again to store the new keyholder pass (hope that it will be fixed this week). I could go ahead anyway storing the password locally as temp solution but it wouldn't be clean imo, so since we are not in a hurry I'd like to wait a bit to follow the right path :)

Dzahn removed a subscriber: Dzahn.Aug 1 2016, 5:36 PM

@elukey: I've dropped Yuvi's expired key from pwstore, so new entries can be added now.

elukey added a comment.Aug 2 2016, 2:35 PM

Created the keys in the private repo and encrypted them with the pass stored in pwstore under analytics-deployment-key-passphrase. The next step is to arm the keyholder (I'd like to do it with someone like @thcipriani to be sure that I don't make a mess) and deploy.

Change 299719 merged by Elukey:
Move the Analytics Refinery role to scap3

https://gerrit.wikimedia.org/r/299719

elukey added a comment.Aug 5 2016, 8:38 AM

Updated https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery

We have some issues currently with the fact that the Refinery is a big repo (thanks to git-fat) and Scap creates a new copy after each deployment. We'll need to be careful with disk space consumption but in my opinion the migration is done!

elukey moved this task from Paused to Done on the Analytics-Kanban board.Aug 5 2016, 3:04 PM

\o/ thanks everyone, refinery is quite painless to deploy this way.

Nuria closed this task as Resolved.Aug 8 2016, 3:20 PM