Page MenuHomePhabricator

Automate kerberos credential creation and management to ease the creation of testing infrastructure
Open, LowPublic

Description

The main goal for this task is to allow the creation of testing infrastructures via Pontoon in which a full kerberos stack is bootstrapped. The current set up doesn't allow this since multiple manual steps are needed:

  • Every new user gets a krb principal, and one SRE needs to run a script like the following on krb1001 to create it: sudo manage_principals.py create batman --email_address=etc..@wikimedia.org.
  • Every daemon that needs to authenticate via kerberos needs a keytab, that is generated on krb1001 via generate_keytabs.py and rsynced manually to the puppet private repository (and committed to it).

The more we automate the better :)

Event Timeline

odimitrijevic moved this task from Incoming to Operational Excellence on the Analytics board.

Change 736753 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P::kerberos: automate principal management

https://gerrit.wikimedia.org/r/736753

Reporting some notes from https://gerrit.wikimedia.org/r/c/operations/puppet/+/736753 and IRC conversations with @Majavah (thanks a lot for the work!). We could proceed in multiple ways:

  1. Extend the current python scripts in puppet without creating libs (easier but potentially messy).
  2. Create a separate python lib/scripts package and add it into a .deb under the operations gerrit namespace. We could deploy it on the krb* nodes and let it use it by puppet.
  3. Create a puppet custom resource (https://puppet.com/docs/puppet/6/custom_resources.html) or a provider for the user resource. Examples in modules/docker/lib/puppet/provider/package/docker.rb or modules/scap/lib/puppet/{type,provider}

I have never done 3) and I don't love Ruby, but it could be interesting to see if a provider is doable, it would be way more integrated into puppet. Thoughts?

Option 3 is certainly an interesting idea, but I'd rather go with 1 or 2 here. Mostly because I don't think we can use the same model for host keytabs and the other options would let us share some code with our existing and future kerberos tools.

In T292389#7530393, @Majavah wrote:

Option 3 is certainly an interesting idea, but I'd rather go with 1 or 2 here. Mostly because I don't think we can use the same model for host keytabs and the other options would let us share some code with our existing and future kerberos tools.

I'd separate the two use cases, at the moment the procedure to create a keytab is way more complex:

  1. use the related script on krb1001 with the right settings (at the moment, an input text file with the specifications of how the keytabs will look like).
  2. rsync the keytab to the puppetmaster1001's puppet private repo (we have a special user/password, we do it manually).
  3. add the file to git and commit
  4. in puppet "public" add a specific configuration to hiera to deploy the keytab under /etc/...

The principal creation is way more simpler and totally different, I wouldn't really consider the two use cases related.

Fair point (although I still want to have dreams about automating that too)!

Change 751100 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] kerberos: manage users with custom puppet type

https://gerrit.wikimedia.org/r/751100

@elukey @Majavah Following up on this task, is the merging of the patch blocked? Should this be deprioritized for now or is there anything that others in the DE team can do to help complete the work?

@elukey @Majavah Following up on this task, is the merging of the patch blocked? Should this be deprioritized for now or is there anything that others in the DE team can do to help complete the work?

There are two independent patches attached to this task in review. Someone needs to pick one of them as a better approach over the another (or come up with an even better one) and review that.

BTullis subscribed.

@Majavah - Apologies for the delay on this. I think that given the current workload we're unlikely to find time to implement the bigger project (T292388 ) at the moment, so if you don't mind I'd like to kick this into the long grass for a little while.
We still have manual processes that work and the toil isn't causing us a lot of work, but I value your work on both patches. I hope to come back to it shortly.

JArguello-WMF lowered the priority of this task from High to Low.Jun 30 2023, 2:34 PM
JArguello-WMF moved this task from Ops Week to Radar (External Teams) on the Data-Engineering board.