⚓ T251619 (Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.org

Subject	Repo	Branch	Lines +/-
updating mgmt ip to reflect correct asset tag cloudcephosd host	operations/dns	master	+4 -2
Ceph osd nodes: install bootstrap keyring	operations/puppet	production	+17 -8
Make the rest of the cloudcephosd hosts into osd nodes	operations/puppet	production	+2 -7
Fix naming of cloudcephosd1004 in site.pp	operations/puppet	production	+6 -1
Make cloudcephosd1005.eqiad.wmnet a ceph node	operations/puppet	production	+2 -2
WMCS Ceph: add address entries for new OSD nodes	operations/puppet	production	+108 -0
cloudcephosd nodes: Experiment with using a hw raid for the / volume	operations/puppet	production	+3 -1
Rename cloudcephosd1004 through 1015.	operations/puppet	production	+13 -13
Adding the mgmt dns entries created by netbox to dns file (not yet automated)	operations/dns	master	+36 -5
Addig cloudcephosd to cloud-host vlan	operations/dns	master	+24 -0
Adding cloudcephosd servers to private vlan	operations/dns	master	+24 -2
Adding production dns for cloudcephosd1004-1015	operations/dns	master	+48 -1
Add cloudcephosd mac addressess to dhcpd file	operations/puppet	production	+60 -0
Adding new cloudceph servers to site.pp	operations/puppet	production	+5 -0

wiki_willy renamed this task from (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org to (Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.org.Jun 8 2020, 8:23 PM

host rack. switch port asset tag
cloudcephosd1004 C8 22 WMF5103
cloudcephosd1005 C8 23 WMF5104
cloudcephosd1006 C8 24 WMF4831
cloudcephosd1007 C8 25 WMF4830
cloudcephosd1008 C8 26 WMF4829
cloudcephosd1009 C8 27 WMF4828
cloudcephosd1010 D5 6 WMF4827
cloudcephosd1011 D5 7 WMF4826
cloudcephosd1012 D5 8 WMF4825
cloudcephosd1013 D5 9 WMF4824
cloudcephosd1014 D5 10 WMF4823
cloudcephosd1015 D5 11 WMF4822

Jclark-ctr reassigned this task from Jclark-ctr to • Cmjohnson.Jun 29 2020, 4:41 PM

Jclark-ctr subscribed.

The network switches still need to be connected to the network, in the meantime, everything will be completed so they can be imaged

I have updated the switch port descriptions but have not set any vlans.

Change 613333 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns for cloudcephosd1004-1015

https://gerrit.wikimedia.org/r/613333

gerritbot added a project: Patch-For-Review.Jul 16 2020, 11:09 PM

Change 613333 merged by Cmjohnson:
[operations/dns@master] Adding production dns for cloudcephosd1004-1015

https://gerrit.wikimedia.org/r/613333

Maintenance_bot removed a project: Patch-For-Review.Jul 17 2020, 12:10 AM

Change 615182 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding new cloudceph servers to site.pp

https://gerrit.wikimedia.org/r/615182

gerritbot added a project: Patch-For-Review.Jul 21 2020, 11:25 AM

Change 615182 merged by Cmjohnson:
[operations/puppet@production] Adding new cloudceph servers to site.pp

https://gerrit.wikimedia.org/r/615182

Change 615183 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Add cloudcephosd mac addressess to dhcpd file

https://gerrit.wikimedia.org/r/615183

Change 615183 merged by Cmjohnson:
[operations/puppet@production] Add cloudcephosd mac addressess to dhcpd file

https://gerrit.wikimedia.org/r/615183

There has been some confusions and some informal IRC discussions about how best to cable and vlan those hosts.

The initial PoC ceph hosts had their interfaces in public-b and private-b, but those are not ideal:

public-b was chosen to cross rows boundaries, which is not needed as both ceph and virts hosts will live in the same vlan (cloud-hosts) and not need to be publicly reachable
private-b was chosen as it is a vlan different from public-b, and not for the benefits of the private vlan itself (routable and secure)

That's why I suggested (in T251632#6292589 but it would have been better here) to configure them that way:

eth0:cloud-hosts1-eqiad (main vlan)
eth1:cloud-storage1-eqiad (unrouted private vlan)

Both present in row B, cloudsw-c8 and cloudsw-d5.

This also means that the PoC hosts will need to be renumbered to those final vlans.

Change 615513 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding cloudcephosd servers to private vlan

https://gerrit.wikimedia.org/r/615513

Change 615513 merged by Cmjohnson:
[operations/dns@master] Adding cloudcephosd servers to private vlan

https://gerrit.wikimedia.org/r/615513

@ayounsi, the POC hosts are currently hosting a small amount of user workload. Will renumbering them cause a service interruption?

Edited to add: nevermind, Brooke is stating this more clearly

@ayounsi I agree about the choice of public-b, I think. If we don't need to cross rows, and we can still contact the initial PoC cluster because that is part of this cluster. This needs to talk to the cephmon hosts for that cluster and all three existing cephosd hosts. It is a hard requirement that it be able to talk to the initial three cephosd hosts. They are on the private-b network because it is secure and that traffic should be reasonably secure because ceph trusts it. I'm not sure about the idea that it doesn't need to be routed? It depends on what that implies here. That network (in ceph it is called the private network, which is confusing for these discussions) is normal network traffic between hosts and not crossover or something. It is the busier of the two networks for an OSD server and has to be able to see all other OSDs in the cluster, as currently set up.

I guess my strongest concern here is that the PoC cluster is actually the seed for the rest of the build-out, and it is not separate. These hosts need to be in touch with those hosts in the end. I worry about the requirement of renumbering them blocking this because of @Andrew's comment, if you can help us understand the implications etc. :)

NOTE: For any spectators, we are setting up a meeting to make sure we are all synced up on this ASAP (likely tomorrow).

Change 615765 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Addig cloudcephosd to cloud-host vlan

https://gerrit.wikimedia.org/r/615765

Change 615765 merged by Cmjohnson:
[operations/dns@master] Addig cloudcephosd to cloud-host vlan

https://gerrit.wikimedia.org/r/615765

Current status on the switches side is that vlans (cloud-hosts + cloud-storage) are configured, but Ceph are offline or not cabled:

xe-0/0/22       up    down cloudcephosd1004:en0
xe-0/0/23       up    down cloudcephosd1005:en0
xe-0/0/24       up    down cloudcephosd1006:en0
xe-0/0/25       up    down cloudcephosd1007:en0
xe-0/0/26       up    down cloudcephosd1008:en0
xe-0/0/27       up    down cloudcephosd1009:en0
xe-0/0/42       up    down cloudcephosd1004:en1
xe-0/0/43       up    down cloudcephosd1005:en1
xe-0/0/44       up    down cloudcephosd1006:en1
xe-0/0/45       up    down cloudcephosd1007:en1
xe-0/0/46       up    down cloudcephosd1008:en1
xe-0/0/47       up    down cloudcephosd1009:en1

I first though they were connected as some servers show up in LLDP:

xe-0/0/37          -                   bc:97:e1:4a:12:d2   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/8           -                   bc:97:e1:4a:12:d3   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/38          -                   bc:97:e1:4a:37:2c   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/9           -                   bc:97:e1:4a:37:2d   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/40          -                   bc:97:e1:4a:68:52   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/11          -                   bc:97:e1:4a:68:53   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/39          -                   bc:97:e1:4a:6d:50   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/10          -                   bc:97:e1:4a:6d:51   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/36          -                   bc:97:e1:4a:7b:ba   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0
xe-0/0/7           -                   bc:97:e1:4a:7b:bb   NIC 1/10Gb Unknown Broadcom Adv. Dual 10Gb Ethernet fw_version:AFW_214.4.6.0

Not sure what they are, maybe hypervisors?

Change 615790 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding the mgmt dns entries created by netbox to dns file (not yet automated)

https://gerrit.wikimedia.org/r/615790

Change 615790 merged by Cmjohnson:
[operations/dns@master] Adding the mgmt dns entries created by netbox to dns file (not yet automated)

https://gerrit.wikimedia.org/r/615790

wiki_willy subscribed.Jul 23 2020, 6:29 PM

Change 615828 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Rename cloudcephosd1004 through 1015.

https://gerrit.wikimedia.org/r/615828

Change 615828 merged by Andrew Bogott:
[operations/puppet@production] Rename cloudcephosd1004 through 1015.

https://gerrit.wikimedia.org/r/615828

Change 615832 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] WMCS Ceph: add address entries for new OSD nodes

https://gerrit.wikimedia.org/r/615832

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1010.eqiad.wmnet', 'cloudcephosd1011.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232043_andrew_3929.log.

Completed auto-reimage of hosts:

['cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1010.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet', 'cloudcephosd1005.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1011.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet']

Change 615838 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudcephosd nodes: Experiment with using a hw raid for the / volume

https://gerrit.wikimedia.org/r/615838

Change 615838 merged by Andrew Bogott:
[operations/puppet@production] cloudcephosd nodes: Experiment with using a hw raid for the / volume

https://gerrit.wikimedia.org/r/615838

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232112_andrew_15882.log.

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232130_andrew_19718.log.

Completed auto-reimage of hosts:

['cloudcephosd1004.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232204_andrew_27467.log.

Completed auto-reimage of hosts:

['cloudcephosd1004.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1011.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232236_andrew_4224.log.

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1010.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232236_andrew_4118.log.

Completed auto-reimage of hosts:

['cloudcephosd1011.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1011.eqiad.wmnet']

Completed auto-reimage of hosts:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1010.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1008.eqiad.wmnet']

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

cloudcephosd1011.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202007232316_andrew_17911_cloudcephosd1011_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1010.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007232315_andrew_17849.log.

Completed auto-reimage of hosts:

['cloudcephosd1011.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1011.eqiad.wmnet']

Completed auto-reimage of hosts:

['cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1010.eqiad.wmnet']

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1010.eqiad.wmnet

cloudcephosd1010.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

cloudcephosd1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202007240016_andrew_1222_cloudcephosd1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cloudcephosd1010.eqiad.wmnet']

and were ALL successful.

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1009.eqiad.wmnet

cloudcephosd1009.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1008.eqiad.wmnet

cloudcephosd1008.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1006.eqiad.wmnet

cloudcephosd1006.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1007.eqiad.wmnet

cloudcephosd1007.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1004.eqiad.wmnet

cloudcephosd1004.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1005.eqiad.wmnet

cloudcephosd1005.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1009.eqiad.wmnet', 'cloudcephosd1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007240048_andrew_8107.log.

Completed auto-reimage of hosts:

['cloudcephosd1005.eqiad.wmnet', 'cloudcephosd1008.eqiad.wmnet', 'cloudcephosd1006.eqiad.wmnet', 'cloudcephosd1007.eqiad.wmnet', 'cloudcephosd1004.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1009.eqiad.wmnet']

ayounsi closed subtask T258764: Error: 'cloudsw1-c8-eqiad.mgmt.eqiad.wmnet' is not a valid parent for host 'cloudcephosd1004' as Resolved.Jul 24 2020, 7:30 AM

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1012.eqiad.wmnet', 'cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007241524_andrew_30738.log.

Change 615832 merged by Andrew Bogott:
[operations/puppet@production] WMCS Ceph: add address entries for new OSD nodes

https://gerrit.wikimedia.org/r/615832

Change 616115 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Make cloudcephosd1005.eqiad.wmnet a ceph node

https://gerrit.wikimedia.org/r/616115

Change 616115 merged by Andrew Bogott:
[operations/puppet@production] Make cloudcephosd1005.eqiad.wmnet a ceph node

https://gerrit.wikimedia.org/r/616115

Change 616119 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Fix naming of cloudcephosd1004 in site.pp

https://gerrit.wikimedia.org/r/616119

Change 616119 merged by Andrew Bogott:
[operations/puppet@production] Fix naming of cloudcephosd1004 in site.pp

https://gerrit.wikimedia.org/r/616119

Completed auto-reimage of hosts:

['cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet', 'cloudcephosd1012.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet', 'cloudcephosd1012.eqiad.wmnet']

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1012.eqiad.wmnet', 'cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet', 'cloudcephosd1011.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007241952_andrew_5592.log.

Completed auto-reimage of hosts:

['cloudcephosd1011.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1012.eqiad.wmnet', 'cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet']

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1012.eqiad.wmnet', 'cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007242013_andrew_26387.log.

Completed auto-reimage of hosts:

['cloudcephosd1012.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet']

Of which those FAILED:

['cloudcephosd1012.eqiad.wmnet', 'cloudcephosd1014.eqiad.wmnet', 'cloudcephosd1013.eqiad.wmnet', 'cloudcephosd1015.eqiad.wmnet']

Change 616847 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Make the rest of the cloudcephosd hosts into osd nodes

https://gerrit.wikimedia.org/r/616847

Change 616847 merged by Andrew Bogott:
[operations/puppet@production] Make the rest of the cloudcephosd hosts into osd nodes

https://gerrit.wikimedia.org/r/616847

Change 616855 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Ceph osd nodes: install bootstrap keyring

https://gerrit.wikimedia.org/r/616855

Change 616855 merged by Andrew Bogott:
[operations/puppet@production] Ceph osd nodes: install bootstrap keyring

https://gerrit.wikimedia.org/r/616855

These hosts are in service now. @Cmjohnson, can this be closed?

Thanks @Andrew for the assist with these! Resolved

Change 619503 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] updating mgmt ip to reflect correct asset tag cloudcephosd host

https://gerrit.wikimedia.org/r/619503

Change 619503 abandoned by Cmjohnson:
[operations/dns@master] updating mgmt ip to reflect correct asset tag cloudcephosd host

Reason:

https://gerrit.wikimedia.org/r/619503

• Cmjohnson closed subtask T251632: (Need By: 2020-06-12) rack/setup/install WMCS 10G switches as Resolved.Oct 1 2020, 6:27 PM

aborrero added a subtask: T266749: cloudcephosd1006: down, possible memory error.Oct 29 2020, 9:06 AM

ayounsi unsubscribed.Oct 30 2020, 12:46 PM

Andrew mentioned this in T268746: [ceph] cloudcephosd1004-1015 think that their hard drives are HDD when they are SSD.Nov 25 2020, 3:14 PM

Andrew closed subtask T266749: cloudcephosd1006: down, possible memory error as Resolved.Jan 20 2021, 2:35 PM

(Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.org
Closed, ResolvedPublic
Actions

Description

Hostname / Racking / Installation Details

Per host setup checklist

Details

Related Objects
Search...

Event Timeline

Status	Assigned	Task
Resolved	taavi	T211393 openstack-browser and horizon: Security group and floating IP quota information being pulled from Nova instead of Neutron for eqiad1-r
Resolved	Andrew	T211777 Can't get quota information from Neutron API
Resolved	Andrew	T261137 upgrade cloud-vps openstack to Openstack version 'Victoria'
Resolved	dcaro	T261136 upgrade cloud-vps openstack to Openstack version 'Ussuri'
Resolved	Andrew	T261138 Upgrade Horizon to latest OpenStack release
Resolved	Andrew	T261135 upgrade cloud-vps openstack to Openstack version 'Train'
Resolved	Andrew	T261134 upgrade cloud-vps openstack to Openstack version 'Stein'
Resolved	Andrew	T259399 Upgrade cloudvirts to Debian Buster
Resolved	dcaro	T216195 Move cloudvirt hosts to 10Gb ethernet
Resolved	Andrew	T194334 [Epic] Modern Cloud VPS storage layer
Resolved	Andrew	T261132 Move all cloud-vps VMs to Ceph
Resolved	Andrew	T253365 Complete build out of Ceph cluster and attach "diskless" cloudvirts
		Unknown Object (Task)
Resolved	• Cmjohnson	T251619 (Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.org
Resolved	Jclark-ctr	T251632 (Need By: 2020-06-12) rack/setup/install WMCS 10G switches
Resolved	ayounsi	T258764 Error: 'cloudsw1-c8-eqiad.mgmt.eqiad.wmnet' is not a valid parent for host 'cloudcephosd1004'
Resolved	aborrero	T266749 cloudcephosd1006: down, possible memory error

	RobH
	May 1 2020, 4:34 PM

(Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.orgClosed, ResolvedPublicActions

Description

Hostname / Racking / Installation Details

Per host setup checklist

Details

Related ObjectsSearch...

Event Timeline

(Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.org
Closed, ResolvedPublic
Actions

Related Objects
Search...