(spicerack) andrew@buster:~$ cookbook -c ~/.config/spicerack/cookbook_config.yaml wmcs.ceph.osd.bootstrap_and_add --new-osd-fqdn cloudcephosd1018.eqiad.wmnet --controlling-node-fqdn cloudcephmon1001.eqiad.wmnet START - Cookbook wmcs.ceph.osd.bootstrap_and_add Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster ----- OUTPUT of 'sudo -i ceph osd set norebalance' ----- norebalance is set ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:02<00:00, 2.78s/hosts] FAIL | | 0% (0/1) [00:02<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i ceph osd set norebalance'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) Running Puppet with args on 1 hosts: cloudcephosd1018.eqiad.wmnet ----- OUTPUT of 'sudo -i run-puppet-agent ' ----- Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Retrieving locales Info: Loading facts Info: Caching catalog for cloudcephosd1018.eqiad.wmnet Info: Applying configuration version '(f21b24e85c) Jbond - wmflib::role_hosts: fix typos' Notice: Applied catalog in 13.80 seconds ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:28<00:00, 28.43s/hosts] FAIL | | 0% (0/1) [00:28<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i run-puppet-agent '. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Rebooting node cloudcephosd1018.eqiad.wmnet ----- OUTPUT of 'sudo -i ceph status -f json' ----- {"fsid":"5917e6d9-06a0-4928-827a-f489384975b1","health":{"status":"HEALTH_WARN","checks":{"OSDMAP_FLAGS":{"severity":"HEALTH_WARN","summary":{"message":"norebalance flag(s) set","count":11},"muted":false}},"mutes":[]},"election_epoch":502,"quorum":[0,1,2],"quorum_names":["cloudcephmon1003","cloudcephmon1002","cloudcephmon1001"],"quorum_age":2298457,"monmap":{"epoch":8,"min_mon_release_name":"octopus","num_mons":3},"osdmap":{"epoch":3788672,"num_osds":152,"num_up_osds":152,"osd_up_since":1629214787,"num_in_osds":152,"osd_in_since":1625220736,"num_remapped_pgs":0},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":6144},{"state_name":"active+clean+scrubbing+deep","count":1}],"num_pgs":6145,"num_pools":4,"num_objects":13679379,"data_bytes":57827911614345,"bytes_used":174059778916352,"bytes_avail":117774390476800,"bytes_total":291834169393152,"read_bytes_sec":931326635,"write_bytes_sec":623458551,"read_op_per_sec":3604,"write_op_per_sec":7687},"fsmap":{"epoch":1,"by_rank":[],"up:standby":0},"mgrmap":{"available":true,"num_standbys":2,"modules":["dashboard","iostat","pg_autoscaler","prometheus","restful"],"services":{"prometheus":"http://cloudcephmon1002.eqiad.wmnet:9283/"}},"servicemap":{"epoch":404644,"modified":"2021-08-25T14:40:12.526032+0000","services":{}},"progress_events":{}} ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.81s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i ceph status -f json'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'sudo -i grep -P ...cinga/icinga.cfg' ----- command_file=/var/lib/icinga/rw/icinga.cmd ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.47s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i grep -P ...cinga/icinga.cfg'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'sudo -i /usr/loc...loudcephosd1018"' ----- {"cloudcephosd1018": {"name": "cloudcephosd1018", "state": "UP", "optimal": true, "downtimed": false, "notifications_enabled": true, "failed_services": []}} ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.78s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i /usr/loc...loudcephosd1018"'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Scheduling downtime on Icinga server alert1001.wikimedia.org for hosts: cloudcephosd1018 ----- OUTPUT of 'sudo -i bash -c .../rw/icinga.cmd '' ----- ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.22s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i bash -c .../rw/icinga.cmd ''. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'sudo -i bash -c .../rw/icinga.cmd '' ----- ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.26s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i bash -c .../rw/icinga.cmd ''. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Rebooting 1 hosts in batches of 1 with 0.0s of sleep in between: cloudcephosd1018.eqiad.wmnet ----- OUTPUT of 'sudo -i reboot-host' ----- ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.16s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i reboot-host'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- channel 0: open failed: connect failed: Connection refused stdio forwarding failed ssh_exchange_identification: Connection closed by remote host ================ PASS | | 0% (0/1) [00:00<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.59hosts/s] 100.0% (1/1) of nodes failed to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [1/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.02s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [2/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.03s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [3/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.05s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [4/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.03s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [5/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.02s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [6/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.02s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [7/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- ================ PASS | | 0% (0/1) [00:10<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.04s/hosts] 100.0% (1/1) of nodes timeout to execute command 'sudo -i cat /proc/uptime': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [8/360, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Cumin execution failed (exit_code=2) ----- OUTPUT of 'sudo -i cat /proc/uptime' ----- 26.83 1197.54 ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:02<00:00, 2.27s/hosts] FAIL | | 0% (0/1) [00:02<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i cat /proc/uptime'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Found reboot since 2021-08-25 14:43:06.066544 for hosts cloudcephosd1018.eqiad.wmnet Rebooted node cloudcephosd1018.eqiad.wmnet, waiting for cluster to stabilize... ----- OUTPUT of 'sudo -i ceph status -f json' ----- {"fsid":"5917e6d9-06a0-4928-827a-f489384975b1","health":{"status":"HEALTH_WARN","checks":{"OSDMAP_FLAGS":{"severity":"HEALTH_WARN","summary":{"message":"norebalance flag(s) set","count":11},"muted":false}},"mutes":[]},"election_epoch":502,"quorum":[0,1,2],"quorum_names":["cloudcephmon1003","cloudcephmon1002","cloudcephmon1001"],"quorum_age":2298619,"monmap":{"epoch":8,"min_mon_release_name":"octopus","num_mons":3},"osdmap":{"epoch":3788749,"num_osds":152,"num_up_osds":152,"osd_up_since":1629214787,"num_in_osds":152,"osd_in_since":1625220736,"num_remapped_pgs":0},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":6144},{"state_name":"active+clean+scrubbing+deep","count":1}],"num_pgs":6145,"num_pools":4,"num_objects":13677410,"data_bytes":57825754645973,"bytes_used":174052726198272,"bytes_avail":117781443194880,"bytes_total":291834169393152,"read_bytes_sec":740958192,"write_bytes_sec":260664495,"read_op_per_sec":2116,"write_op_per_sec":3856},"fsmap":{"epoch":1,"by_rank":[],"up:standby":0},"mgrmap":{"available":true,"num_standbys":2,"modules":["dashboard","iostat","pg_autoscaler","prometheus","restful"],"services":{"prometheus":"http://cloudcephmon1002.eqiad.wmnet:9283/"}},"servicemap":{"epoch":404646,"modified":"2021-08-25T14:44:29.495150+0000","services":{}},"progress_events":{}} ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.70s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i ceph status -f json'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Cluster stable, continuing ----- OUTPUT of 'sudo -i bash -c .../rw/icinga.cmd '' ----- ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.21s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i bash -c .../rw/icinga.cmd ''. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Finished rebooting node cloudcephosd1018.eqiad.wmnet ----- OUTPUT of 'sudo -i lsblk --json' ----- { "blockdevices": [ {"name":"sda", "maj:min":"8:0", "rm":false, "size":"223.6G", "ro":false, "type":"disk", "mountpoint":null, "children": [ {"name":"sda1", "maj:min":"8:1", "rm":false, "size":"285M", "ro":false, "type":"part", "mountpoint":null}, {"name":"sda2", "maj:min":"8:2", "rm":false, "size":"223.3G", "ro":false, "type":"part", "mountpoint":null, "children": [ {"name":"md0", "maj:min":"9:0", "rm":false, "size":"223.2G", "ro":false, "type":"raid1", "mountpoint":null, "children": [ {"name":"vg0-root", "maj:min":"253:0", "rm":false, "size":"74.5G", "ro":false, "type":"lvm", "mountpoint":"/"}, {"name":"vg0-swap", "maj:min":"253:1", "rm":false, "size":"976M", "ro":false, "type":"lvm", "mountpoint":"[SWAP]"}, {"name":"vg0-srv", "maj:min":"253:2", "rm":false, "size":"103.1G", "ro":false, "type":"lvm", "mountpoint":"/srv"} ] } ] } ] }, {"name":"sdb", "maj:min":"8:16", "rm":false, "size":"223.6G", "ro":false, "type":"disk", "mountpoint":null, "children": [ {"name":"sdb1", "maj:min":"8:17", "rm":false, "size":"285M", "ro":false, "type":"part", "mountpoint":null}, {"name":"sdb2", "maj:min":"8:18", "rm":false, "size":"223.3G", "ro":false, "type":"part", "mountpoint":null, "children": [ {"name":"md0", "maj:min":"9:0", "rm":false, "size":"223.2G", "ro":false, "type":"raid1", "mountpoint":null, "children": [ {"name":"vg0-root", "maj:min":"253:0", "rm":false, "size":"74.5G", "ro":false, "type":"lvm", "mountpoint":"/"}, {"name":"vg0-swap", "maj:min":"253:1", "rm":false, "size":"976M", "ro":false, "type":"lvm", "mountpoint":"[SWAP]"}, {"name":"vg0-srv", "maj:min":"253:2", "rm":false, "size":"103.1G", "ro":false, "type":"lvm", "mountpoint":"/srv"} ] } ] } ] }, {"name":"sdc", "maj:min":"8:32", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sdd", "maj:min":"8:48", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sde", "maj:min":"8:64", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sdf", "maj:min":"8:80", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sdg", "maj:min":"8:96", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sdh", "maj:min":"8:112", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sdi", "maj:min":"8:128", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null}, {"name":"sdj", "maj:min":"8:144", "rm":false, "size":"1.8T", "ro":false, "type":"disk", "mountpoint":null} ] } ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.11s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i lsblk --json'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. >>> I'm going to destroy and create a new OSD on cloudcephosd1018.eqiad.wmnet:/dev/sdc. Type "go" to proceed or "abort" to interrupt the execution > go ----- OUTPUT of 'sudo -i ceph-vol...lvm zap /dev/sdc' ----- --> Zapping: /dev/sdc --> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync stderr: 10+0 records in 10+0 records out stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0286299 s, 366 MB/s --> Zapping successful for: <Raw Device: /dev/sdc> ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.76s/hosts] FAIL | | 0% (0/1) [00:01<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i ceph-vol...lvm zap /dev/sdc'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'sudo -i ceph-vol... --data /dev/sdc' ----- Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new f23684e6-db0e-4fa7-bfe9-3a3516e0a852 stderr: [errno 13] RADOS permission denied (error connecting to the cluster) --> RuntimeError: Unable to create a new OSD id ================ PASS | | 0% (0/1) [00:01<?, ?hosts/s] FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:01<00:00, 1.71s/hosts] 100.0% (1/1) of nodes failed to execute command 'sudo -i ceph-vol... --data /dev/sdc': cloudcephosd1018.eqiad.wmnet 0.0% (0/1) success ratio (< 100.0% threshold) for command: 'sudo -i ceph-vol... --data /dev/sdc'. Aborting. 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. Exception raised while executing cookbook wmcs.ceph.osd.bootstrap_and_add: Traceback (most recent call last): File "/home/andrew/.virtualenvs/spicerack/lib/python3.7/site-packages/spicerack/_menu.py", line 234, in run raw_ret = runner.run() File "/home/andrew/cookbooks/cookbooks/wmcs/ceph/osd/bootstrap_and_add.py", line 159, in run interactive=(not self.yes_i_know) File "/home/andrew/cookbooks/cookbooks/wmcs/__init__.py", line 547, in add_all_available_devices self.initialize_and_start_osd(device_path=device_path) File "/home/andrew/cookbooks/cookbooks/wmcs/__init__.py", line 538, in initialize_and_start_osd self._node.run_sync(f"ceph-volume lvm create --bluestore --data {device_path}") File "/home/andrew/.virtualenvs/spicerack/lib/python3.7/site-packages/spicerack/remote.py", line 477, in run_sync is_safe=is_safe, File "/home/andrew/.virtualenvs/spicerack/lib/python3.7/site-packages/spicerack/remote.py", line 668, in _execute raise RemoteExecutionError(ret, "Cumin execution failed") spicerack.remote.RemoteExecutionError: Cumin execution failed (exit_code=2) END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
ceph: fix keyring race condition | operations/puppet | production | +4 -2 |
Event Timeline
for some reason the keyring is empty:
root@cloudcephosd1018:~# wc /var/lib/ceph/bootstrap-osd/ceph.keyring 0 0 0 /var/lib/ceph/bootstrap-osd/ceph.keyring
Indeed there's something going on with puppet and the creation of the keyring.
When running puppet with --test --debug --verbose --tag /var/lib/ceph/bootstrap-osd/ceph.keyring, you can see it's
skipping the creation of the token due to the file existing:
Debug: /Stage[main]/Profile::Ceph::Osd/Ceph::Keyring[client.bootstrap-osd]/Exec[ceph-keyring-client.bootstrap-osd]: '/usr/bin/ceph-authtool --create-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -n client.bootstrap-osd --add-key=... ' won't be executed because of failed check 'creates'
Tried removing the keyring file, and rerunning, and it executed it, but the file it created was empty.
Tried running the command manually and it worked and the file had contents, so it seems to be related to the
environment that puppet runs the command in, looking...
Hmm... yep, I think I know what's going on, when creating the keyring we don't put a dependency between the exec that
populates the file, and the file resource itself, and in this case, it ends up creating the file empty first, and when
the exec comes in, it sees the file there already and just skips.
Will send a patch.
Change 715455 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] ceph: fix keyring race condition
Change 715455 merged by Andrew Bogott:
[operations/puppet@production] ceph: fix keyring race condition