Page MenuHomePhabricator

Make the Kerberos infrastructure production ready
Open, HighPublic21 Story Points

Description

In T212257 a simple KDC + kadmin service was set up on kerberos1001, with minimal puppet automation to:

  1. create principals and keytabs
  2. copy them securely to the puppetmaster's private puppet repo and deploy them via puppet when requested (by hiera variables)

The above unblocked testing Kerberos in the Hadoop test cluster, but it is surely not enough. A few things need to be done:

  • order hardware for the two hosts that will run Kerberos KDC(s) and kadmin daemons (two misc nodes)
  • add puppet automation to bootstrap a KDC service from scratch on a node (caveat: this might mean only partial automation since currently the kdc packages, when installing, require manual inputs)
  • add puppet automation to allow a proper KDC/kadmin failover in case the primary kerberos node goes down.
  • puppetise basic config properties like a default password policy

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Milimetric triaged this task as High priority.Jun 20 2019, 4:29 PM
elukey moved this task from Backlog to In Progress on the User-Elukey board.Jul 4 2019, 2:51 PM
elukey moved this task from In Progress to Backlog on the User-Elukey board.
elukey moved this task from Backlog to Kerberos on the User-Elukey board.Jul 5 2019, 6:57 AM
elukey added a comment.Aug 7 2019, 9:41 AM

Very interesting reading: https://www.tldp.org/HOWTO/Kerberos-Infrastructure-HOWTO/server-replication.html

My understanding is that:

  • kdb5_util dump could be used to periodically dump the status of the master KDC's database to a file. Maybe that could be saved in Bacula or similar?
  • krepl can be used to get a dump of the master database and then propagate it to the KDC's slaves. It is a good use case for a systemd timer with icinga alarms, to monitor if things fail.

In https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-install/Switching-Master-and-Slave-KDCs.html there is a simple procedure to swap master/slave in case one fails. Needs to be expanded though..

I tried to use kdb5_util dump on kerberos1001, the resulting file was 24K. It might be worth to avoid Bacula and have a simple rsync on the KDC slave that copies dumps periodically. As far as I understand replicating from master to slave via krepl is not sufficient, since if the master's database gets corrupted or inconsistent then the problem might get propagated before ad admin can act. Having a dump of the database can help in having periodic (hopefully working) backup.

Change 528775 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kdc: add daily backup for the KDC database

https://gerrit.wikimedia.org/r/528775

elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 528775 merged by Elukey:
[operations/puppet@production] profile::kerberos::kdc: add daily backup for the KDC database

https://gerrit.wikimedia.org/r/528775

Change 529733 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::kerberos::kdc: add support for replication

https://gerrit.wikimedia.org/r/529733

Change 529786 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kdc: add debconf settings

https://gerrit.wikimedia.org/r/529786

Change 529733 merged by Elukey:
[operations/puppet@production] profile::kerberos::kadminserver: add support for replication

https://gerrit.wikimedia.org/r/529733

Change 529786 merged by Elukey:
[operations/puppet@production] profile::kerberos::kdc: add debconf settings

https://gerrit.wikimedia.org/r/529786

Change 539338 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add role::kerberos::kdc to krb1001

https://gerrit.wikimedia.org/r/539338

Change 539338 merged by Elukey:
[operations/puppet@production] Add role::kerberos::kdc to krb1001

https://gerrit.wikimedia.org/r/539338

Change 539524 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] site.pp: add role::kerberos::kdc to kdc2001

https://gerrit.wikimedia.org/r/539524

Change 539524 merged by Elukey:
[operations/puppet@production] site.pp: add role::kerberos::kdc to kdc2001

https://gerrit.wikimedia.org/r/539524

Change 539546 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Enable kerberos replication on krb[12]001

https://gerrit.wikimedia.org/r/539546

Change 539546 merged by Elukey:
[operations/puppet@production] Enable kerberos replication on krb[12]001

https://gerrit.wikimedia.org/r/539546

Change 539580 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::replication: fix replicate_krb_database script

https://gerrit.wikimedia.org/r/539580

Change 539580 merged by Elukey:
[operations/puppet@production] profile::kerberos::replication: fix replicate_krb_database script

https://gerrit.wikimedia.org/r/539580

elukey added a comment.Tue, Oct 1, 3:25 PM

Summary of progresses:

Left to do:

  • add bacula backups to every host of the kerberos cluster to save the snapshots of the database
  • move the Hadoop test cluster to the new cluster and test failover (kdc on 1001 going down, etc..)

Change 540610 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Switch the Hadoop test cluster to krb1001/krb2001

https://gerrit.wikimedia.org/r/540610

Change 540610 merged by Elukey:
[operations/puppet@production] Switch the Hadoop test cluster to krb1001/krb2001

https://gerrit.wikimedia.org/r/540610

elukey added a subscriber: ayounsi.EditedThu, Oct 3, 3:41 PM
elukey@re0.cr1-eqiad# show | compare
[edit firewall family inet filter analytics-in4 term kerberos from destination-address]
         10.64.0.182/32 { ... }
+        /* krb1001 */
+        10.64.0.112/32;
+        /* krb2001 */
+        10.192.48.135/32;
[edit firewall family inet6 filter analytics-in6]
       term scap { ... }
+      term kerberos {
+          from {
+              destination-address {
+                  /* krb1001 */
+                  2620::861:101:10:64:0:112/128;
+                  /* krb2001 */
+                  2620::860:104:10:192:48:135/128;
+              }
+              next-header [ tcp udp ];
+              destination-port [ 88 464 ];
+          }
+          then accept;
+      }
       term default { ... }

elukey@re0.cr2-eqiad# show | compare
[edit firewall family inet filter analytics-in4 term kerberos from destination-address]
         10.64.0.182/32 { ... }
+        /* krb1001 */
+        10.64.0.112/32;
+        /* krb2001 */
+        10.192.48.135/32;
[edit firewall family inet6 filter analytics-in6]
       term scap { ... }
+      term kerberos {
+          from {
+              destination-address {
+                  /* krb1001 */
+                  2620::861:101:10:64:0:112/128;
+                  /* krb2001 */
+                  2620::860:104:10:192:48:135/128;
+              }
+              next-header [ udp tcp ];
+              destination-port [ 88 464 ];
+          }
+          then accept;
+      }
       term default { ... }

CC: @ayounsi

elukey added a comment.Fri, Oct 4, 7:36 AM

Updates:

  • re-created all principals and keytabs for the Hadoop test cluster and move it to krb1001/krb2001
  • verified that replication works between krb1001 and krb2001 (kadmin.local on krb2001 -> getprincs)
  • decommissioned kerberos1001
  • added network firewall rules in the Analytics VLAN to allow IPv4/IPv6 addresses of krb1001 and krb2001

Remaining to do:

  • test failover of the KDCs and the Hadoop cluster
  • add bacula backups for the kerberos database
elukey added a comment.Fri, Oct 4, 8:56 AM

Did a quick test:

  • kdestroy on an-tool1006
  • stop kdc on krb1001
  • kinit on an-tool1006
  • check kdc logs for my username on krb2001

And everything worked smoothly without any client error. Next step is to test Hadoop daemons.

Change 540832 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kdc: add support for bacula backups

https://gerrit.wikimedia.org/r/540832

Change 541370 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Add kerberos hosts to analytics-in4 + add kerberos to analytics-in6

https://gerrit.wikimedia.org/r/541370

Change 541370 merged by Ayounsi:
[operations/homer/public@master] Add kerberos hosts to analytics-in4 + add kerberos to analytics-in6

https://gerrit.wikimedia.org/r/541370

Change 540832 merged by Elukey:
[operations/puppet@production] profile::kerberos::kdc: add support for bacula backups

https://gerrit.wikimedia.org/r/540832

I have stopped the kdc on krb1001 to simulate a host down scenario. I am able to renew my krb ticket but I want to leave it down for hours to see what happens to hadoop daemons (hopefully nothing).

elukey set the point value for this task to 21.Wed, Oct 9, 9:48 AM

Another thing we need to do: Add a new flag to data.yaml to annotate that a user is kerberos-enabled (as we need to ensure to also drop Kerberos user principals when offboarding users).

One other thing (not necessarily now) is to add a monitoring check, e.g. https://exchange.nagios.org/directory/Plugins/Security/check_krb5

One other thing (not necessarily now) is to add a monitoring check, e.g. https://exchange.nagios.org/directory/Plugins/Security/check_krb5

My idea was to add initially only a nagios process count check, and then think about something like check_krb5. Would it be reasonable?

One other thing (not necessarily now) is to add a monitoring check, e.g. https://exchange.nagios.org/directory/Plugins/Security/check_krb5

My idea was to add initially only a nagios process count check, and then think about something like check_krb5. Would it be reasonable?

Makes sense, let's split this to a separate task.

Interesting:

Oct 10 03:00:01 krb2001 kpropd[26599]: Connection from krb1001.eqiad.wmnet
Oct 10 03:26:24 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 03:26:24 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 03:26:24 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 03:26:24 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 03:26:25 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 03:26:25 krb2001 kpropd[30814]: ready
Oct 10 03:56:19 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 03:56:19 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 03:56:19 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 03:56:19 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 03:56:20 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 03:56:20 krb2001 kpropd[3704]: ready
Oct 10 04:00:01 krb2001 kpropd[4837]: Connection from krb1001.eqiad.wmnet
Oct 10 04:26:38 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 04:26:38 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 04:26:38 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 04:26:38 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 04:26:39 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 04:26:39 krb2001 kpropd[9058]: ready
Oct 10 04:56:55 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 04:56:55 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 04:56:55 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 04:56:55 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 04:56:55 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 04:56:55 krb2001 kpropd[14269]: ready
Oct 10 05:00:01 krb2001 kpropd[15324]: Connection from krb1001.eqiad.wmnet
Oct 10 05:27:07 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 05:27:07 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 05:27:07 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 05:27:07 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 05:27:07 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 05:27:07 krb2001 kpropd[19125]: ready
Oct 10 05:56:54 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 05:56:54 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 05:56:54 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 05:56:54 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 05:56:54 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 05:56:54 krb2001 kpropd[23310]: ready
Oct 10 06:00:01 krb2001 kpropd[24370]: Connection from krb1001.eqiad.wmnet
Oct 10 06:26:48 krb2001 systemd[1]: Stopping Kerberos 5 slave KDC update server...
Oct 10 06:26:48 krb2001 systemd[1]: krb5-kpropd.service: Main process exited, code=killed, status=15/TERM
Oct 10 06:26:48 krb2001 systemd[1]: krb5-kpropd.service: Succeeded.
Oct 10 06:26:48 krb2001 systemd[1]: Stopped Kerberos 5 slave KDC update server.
Oct 10 06:26:48 krb2001 systemd[1]: Started Kerberos 5 slave KDC update server.
Oct 10 06:26:49 krb2001 kpropd[28830]: ready

Puppet restarts kadmind every 30 mins but why it shutdowns?

Change 542014 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: add nagios process monitors to kdc/kadmind daemons

https://gerrit.wikimedia.org/r/542014

elukey added a comment.EditedThu, Oct 10, 7:05 AM
elukey@krb2001:~$ sudo systemctl cat krb5-kpropd.service
# /lib/systemd/system/krb5-kpropd.service
[Unit]
Description=Kerberos 5 slave KDC update server
Conflicts=krb5-admin-server.service

[Service]
ExecReload=/bin/kill -HUP $MAINPID
EnvironmentFile=-/etc/default/krb5-kpropd
ExecStart=/usr/sbin/kpropd -D $DAEMON_ARGS
InaccessibleDirectories=-/etc/ssh -/etc/ssl/private  /root
ReadOnlyDirectories=/
ReadWriteDirectories=/var/tmp /tmp /var/lib/krb5kdc /var/run /run
CapabilityBoundingSet=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target
elukey@krb2001:~$ cat /etc/default/krb5-kpropd
cat: /etc/default/krb5-kpropd: No such file or directory
elukey@krb2001:~$ sudo less /usr/sbin/kpropd
elukey@krb2001:~$ sudo /usr/sbin/kpropd --help
/usr/sbin/kpropd: unrecognized option '--help'

Usage: /usr/sbin/kpropd [-r realm] [-s srvtab] [-dS] [-f replica_file]
	[-F kerberos_db_file ] [-p kdb5_util_pathname]
	[-x db_args]* [-P port] [-a acl_file]
	[-A admin_server] [--pid-file=pid_file]

kpropd can also run as a standalone daemon, backgrounding itself and waiting for connections on port 754 (or the port specified with the -P option if given). Standalone mode is required for incremental propagation. Starting in release 1.11, kpropd automatically detects whether it was run from inetd and runs in standalone mode if it is not. Prior to release 1.11, the -S option is required to run kpropd in standalone mode; this option is now accepted for backward compatibility but does nothing.

As test, I just added the -P 754 option and restarted kpropd.

Change 542014 merged by Elukey:
[operations/puppet@production] kerberos: add nagios process monitors to kdc/kadmind daemons

https://gerrit.wikimedia.org/r/542014

elukey added a comment.EditedThu, Oct 10, 7:58 AM

Nope, it seems that puppet is causing the stop/start of kpropd, rsync and kadmind on 2001:

Notice: /Stage[main]/Profile::Kerberos::Kadminserver/Service[krb5-admin-server]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Profile::Kerberos::Kadminserver/Service[krb5-admin-server]: Unscheduling refresh on Service[krb5-admin-server]
Notice: /Stage[main]/Profile::Kerberos::Replication/Service[krb5-kpropd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Profile::Kerberos::Replication/Service[krb5-kpropd]: Unscheduling refresh on Service[krb5-kpropd]
Notice: /Stage[main]/Rsync::Server/Service[rsync]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Rsync::Server/Service[rsync]: Unscheduling refresh on Service[rsync]

Two daemons make sense:

elukey@krb2001:~$ sudo systemctl status rsync
● rsync.service - fast remote file copy program daemon
   Loaded: loaded (/lib/systemd/system/rsync.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Thu 2019-10-10 07:57:38 UTC; 1min 54s ago
           └─ ConditionPathExists=/etc/rsyncd.conf was not met
     Docs: man:rsync(1)
           man:rsyncd.conf(5)

Oct 10 05:56:56 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 06:26:51 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 06:57:06 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:09:21 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:27:10 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:51:32 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:52:11 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:52:57 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:56:20 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Oct 10 07:57:38 krb2001 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.

elukey@krb2001:~$ sudo systemctl status krb5-admin-server.service
● krb5-admin-server.service - Kerberos 5 Admin Server
   Loaded: loaded (/lib/systemd/system/krb5-admin-server.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2019-10-10 07:57:36 UTC; 2min 40s ago
 Main PID: 16372 (code=exited, status=0/SUCCESS)

Oct 10 07:57:36 krb2001 systemd[1]: Stopping Kerberos 5 Admin Server...
Oct 10 07:57:36 krb2001 kadmind[16372]: finished, exiting
Oct 10 07:57:36 krb2001 kadmind[16372]: closing down fd 13
Oct 10 07:57:36 krb2001 kadmind[16372]: closing down fd 12
Oct 10 07:57:36 krb2001 kadmind[16372]: closing down fd 11
Oct 10 07:57:36 krb2001 kadmind[16372]: closing down fd 10
Oct 10 07:57:36 krb2001 kadmind[16372]: closing down fd 9
Oct 10 07:57:36 krb2001 kadmind[16372]: closing down fd 8
Oct 10 07:57:36 krb2001 systemd[1]: krb5-admin-server.service: Succeeded.
Oct 10 07:57:36 krb2001 systemd[1]: Stopped Kerberos 5 Admin Server.

The kadmin server seems stopping by itself, but kadmin.local works on 2001..

ayounsi removed a subscriber: ayounsi.Thu, Oct 10, 8:03 AM

The kadmin server seems stopping by itself, but kadmin.local works on 2001..

Self answer:

kadmin and kadmin.local are command-line interfaces to the Kerberos V5 administration system. They provide nearly identical functionalities; the difference is that kadmin.local directly accesses the KDC database, while kadmin performs operations using kadmind.

Change 542062 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: ensure kadmind and rsync only on the master node

https://gerrit.wikimedia.org/r/542062

Change 542062 merged by Elukey:
[operations/puppet@production] kerberos: ensure kadmind and rsync only on the master node

https://gerrit.wikimedia.org/r/542062

Seems fixed now. The culprit I believe it was:

elukey@krb2001:~$ sudo systemctl cat krb5-kpropd.service
# /lib/systemd/system/krb5-kpropd.service
[Unit]
Description=Kerberos 5 slave KDC update server
Conflicts=krb5-admin-server.service   <==================

Change 542067 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: add nagios process monitoring for kpropd

https://gerrit.wikimedia.org/r/542067

Change 542067 merged by Elukey:
[operations/puppet@production] kerberos: add nagios process monitoring for kpropd

https://gerrit.wikimedia.org/r/542067

Change 542092 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: test kadmin failover/swap

https://gerrit.wikimedia.org/r/542092

Change 542092 merged by Elukey:
[operations/puppet@production] kerberos: test kadmin failover/swap

https://gerrit.wikimedia.org/r/542092

One other thing (not necessarily now) is to add a monitoring check, e.g. https://exchange.nagios.org/directory/Plugins/Security/check_krb5

My idea was to add initially only a nagios process count check, and then think about something like check_krb5. Would it be reasonable?

Makes sense, let's split this to a separate task.

And we should also have an Icinga check to ensure the replica is up-to-date.

Change 542102 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: ensure resources that might change during failover

https://gerrit.wikimedia.org/r/542102

Change 542102 merged by Elukey:
[operations/puppet@production] kerberos: ensure resources that might change during failover

https://gerrit.wikimedia.org/r/542102

One other thing (not necessarily now) is to add a monitoring check, e.g. https://exchange.nagios.org/directory/Plugins/Security/check_krb5

My idea was to add initially only a nagios process count check, and then think about something like check_krb5. Would it be reasonable?

Makes sense, let's split this to a separate task.

And we should also have an Icinga check to ensure the replica is up-to-date.

In theory this should be ensured by the replication script ending up in a zero return no?

One other thing (not necessarily now) is to add a monitoring check, e.g. https://exchange.nagios.org/directory/Plugins/Security/check_krb5

My idea was to add initially only a nagios process count check, and then think about something like check_krb5. Would it be reasonable?

Makes sense, let's split this to a separate task.

And we should also have an Icinga check to ensure the replica is up-to-date.

In theory this should be ensured by the replication script ending up in a zero return no?

Probably, needs a closer look. I was wondering about a case, where it runs, but garbles data or does an outdated copy. But OTOH we can still retrieve backups, so maybe that's actually not that much of a concern in practice.

Change 542112 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: test failover (part 2)

https://gerrit.wikimedia.org/r/542112

Change 542112 merged by Elukey:
[operations/puppet@production] kerberos: test failover (part 2)

https://gerrit.wikimedia.org/r/542112

Change 542133 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] kerberos: enable monitoring

https://gerrit.wikimedia.org/r/542133

Change 542133 merged by Elukey:
[operations/puppet@production] kerberos: enable monitoring

https://gerrit.wikimedia.org/r/542133

Tested the failover and improved the puppet code to do proper clean ups when failing back to the original state. Tested a change in password, all worked as expected.

Finally added monitoring.

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Thu, Oct 10, 3:15 PM

Change 542159 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kadminserver: fix typo in monitoring

https://gerrit.wikimedia.org/r/542159

Change 542159 merged by Elukey:
[operations/puppet@production] profile::kerberos::kadminserver: fix typo in monitoring

https://gerrit.wikimedia.org/r/542159

elukey moved this task from Kerberos to Done on the User-Elukey board.Fri, Oct 11, 7:04 AM