== Overview
We have recently bootstrapped a new ceph cluster in T330149
Doing so highlighted a couple of problems about the way that puppet behaves when bootstrapping monitor nodes.
This ticket exists in order to record any activity on investigating and fixing these problems.
It will be useful to carry out this work on the new ceph cluster, before it goes into production.
However, we must be mindful that the ceph module is also in use by the //cloudceph// clusters that are in production and are managed by the #cloud-services team.
Whilst the puppet module is shared, the cloudceph cluster currently uses its own puppet //profiles// and a different version of Ceph (15 vs 17)
Any one or more of these factors might have an impact on how the behaviour of bootstrapping a cluster, a monitor daemon, and a manager daemon, might differ between the two cases.
== Observations
=== 1: Bootstrapping a new monitor fails with a named monitor key
When bootstrapping a monitor server, whether this is for a new cluster or an existing cluster, puppet runs the following command on each monitor
```
/usr/bin/ceph-mon --mkfs -i ${::hostname} --fsid ${fsid} --keyring ${temp_keyring}
```
This exec is defined [[https://github.com/wikimedia/operations-puppet/blob/production/modules/ceph/manifests/mon.pp#L50-L55|here]].
Note that we are using the method of [[https://docs.ceph.com/en/quincy/dev/mon-bootstrap/#expanding-with-initial-members|expanding-with-initial-members]] defined here, since we already have an `/etc/ceph/ceph.conf` present, which contains the `mon initial members` option.
The `$temp_keyring` contains two keys, concatenated into one text file: `/var/lib/ceph/tmp/ceph.mon.keyring`
* The contents of the monitor authentcation key: `mon.${::hostname}`
* The contents of the `client admin` key
The following screenshot shows this temporary file, with keydata redacted. It also highlights the named section of the monitor key.
{F36925521}
When this command is executed on a mon node running ceph quincy, the result is that the file `/var/lib/ceph/mon/ceph-$hostname/keyring` is not created. Attempting to start the mon service results in errors as shown.
```
auth: error reading file: /var/lib/ceph/mon/ceph-cephosd1001/keyring: can't open /var/lib/ceph/mon/ceph-cephosd1001/keyring: (2) No such file or directory
mon.cephosd1001@-1(???) e0 unable to load initial keyring /var/lib/ceph/mon/ceph-cephosd1001/keyring
```
The documents (https://docs.ceph.com/en/quincy/dev/mon-bootstrap/#secret-keys) refer to a secret key named `mon.` instead of a key named `mon.$hostname`
> The mon. secret key is stored a keyring file in the mon data directory.
In order to bootstrap the cluster, I manually modified `/var/lib/ceph/tmp/ceph.mon.keyring` on each of the servers, removing the hostname from the monitor key, leaving `[mon.]` in its place. I then executed the command manually.
Upon doing so, the keyring was created in the correct location and the mon process started successfully.
=== 2: Timeouts running puppet (ceph auth) before the cluster is bootstrapped
TODO explain observation
=== 3: The mgr keys were created instead of being imported so the keydata did not match
TODO explain observation