Page MenuHomePhabricator

Make the Kerberos infrastructure production ready
Open, HighPublic

Description

In T212257 a simple KDC + kadmin service was set up on kerberos1001, with minimal puppet automation to:

  1. create principals and keytabs
  2. copy them securely to the puppetmaster's private puppet repo and deploy them via puppet when requested (by hiera variables)

The above unblocked testing Kerberos in the Hadoop test cluster, but it is surely not enough. A few things need to be done:

  • order hardware for the two hosts that will run Kerberos KDC(s) and kadmin daemons (two misc nodes)
  • add puppet automation to bootstrap a KDC service from scratch on a node (caveat: this might mean only partial automation since currently the kdc packages, when installing, require manual inputs)
  • add puppet automation to allow a proper KDC/kadmin failover in case the primary kerberos node goes down.
  • puppetise basic config properties like a default password policy

Event Timeline

elukey created this task.Jun 19 2019, 10:37 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 19 2019, 10:37 AM

add puppet automation to bootstrap a KDC service from scratch on a node (caveat: this might mean only partial automation since currently the kdc packages, when installing, require manual inputs)

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494242/ has some of the (untested) debconf::set config lines I mentioned at the SRE Summit.

Milimetric triaged this task as High priority.Jun 20 2019, 4:28 PM
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.
elukey moved this task from Backlog to In Progress on the User-Elukey board.Jul 4 2019, 2:51 PM
elukey moved this task from In Progress to Backlog on the User-Elukey board.
elukey moved this task from Backlog to Kerberos on the User-Elukey board.Jul 5 2019, 6:57 AM
elukey added a comment.Wed, Aug 7, 9:41 AM

Very interesting reading: https://www.tldp.org/HOWTO/Kerberos-Infrastructure-HOWTO/server-replication.html

My understanding is that:

  • kdb5_util dump could be used to periodically dump the status of the master KDC's database to a file. Maybe that could be saved in Bacula or similar?
  • krepl can be used to get a dump of the master database and then propagate it to the KDC's slaves. It is a good use case for a systemd timer with icinga alarms, to monitor if things fail.

In https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-install/Switching-Master-and-Slave-KDCs.html there is a simple procedure to swap master/slave in case one fails. Needs to be expanded though..

I tried to use kdb5_util dump on kerberos1001, the resulting file was 24K. It might be worth to avoid Bacula and have a simple rsync on the KDC slave that copies dumps periodically. As far as I understand replicating from master to slave via krepl is not sufficient, since if the master's database gets corrupted or inconsistent then the problem might get propagated before ad admin can act. Having a dump of the database can help in having periodic (hopefully working) backup.

Change 528775 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kdc: add daily backup for the KDC database

https://gerrit.wikimedia.org/r/528775

elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 528775 merged by Elukey:
[operations/puppet@production] profile::kerberos::kdc: add daily backup for the KDC database

https://gerrit.wikimedia.org/r/528775

Change 529733 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::kerberos::kdc: add support for replication

https://gerrit.wikimedia.org/r/529733

Change 529786 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kdc: add debconf settings

https://gerrit.wikimedia.org/r/529786

Change 529733 merged by Elukey:
[operations/puppet@production] profile::kerberos::kadminserver: add support for replication

https://gerrit.wikimedia.org/r/529733