Arzhel and the wmcs team decided to make this move a few days ago; the full plan is documented here:
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Ceph#Network
Arzhel and the wmcs team decided to make this move a few days ago; the full plan is documented here:
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Ceph#Network
Change 616150 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] Add ips in cloud-hosts1-b-eqiad for cloudcephmon nodes
Change 616151 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] Remove public IP addresses for cloudcephmons nodes
Change 616152 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] Remove eth1 addresses for cloudcephmon hosts
Change 616153 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move cloudcephmon hosts from .wikimedia.org to .eqiad.wmnet
Change 616150 merged by Andrew Bogott:
[operations/dns@master] Add ips in cloud-hosts1-b-eqiad for cloudcephmon nodes
Change 616156 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move cloudcephmon1002 from .wikimedia.org to .eqiad.wmnet
Change 616157 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move cloudcephmon1001 from .wikimedia.org to .eqiad.wmnet
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1003.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007242007_andrew_19879_cloudcephmon1003_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1003.eqiad.wmnet']
Of which those FAILED:
['cloudcephmon1003.eqiad.wmnet']
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1003.wikimedia.org
The log can be found in /var/log/wmf-auto-reimage/202007242008_andrew_20298_cloudcephmon1003_wikimedia_org.log.
Change 616153 merged by Andrew Bogott:
[operations/puppet@production] Move cloudcephmon1003 from .wikimedia.org to .eqiad.wmnet
Completed auto-reimage of hosts:
['cloudcephmon1003.wikimedia.org']
Of which those FAILED:
['cloudcephmon1003.wikimedia.org']
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1003.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007242024_andrew_2972_cloudcephmon1003_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1003.eqiad.wmnet']
Of which those FAILED:
['cloudcephmon1003.eqiad.wmnet']
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1003.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007242024_andrew_3206_cloudcephmon1003_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1003.eqiad.wmnet']
Of which those FAILED:
['cloudcephmon1003.eqiad.wmnet']
Change 616168 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Add site.pp entries for cloudcephmon1001-1003.eqiad.wmnet
Change 616168 merged by Andrew Bogott:
[operations/puppet@production] Add site.pp entries for cloudcephmon1001-1003.eqiad.wmnet
Presumably this move requires a switch config change; I need to only do one host at a time to avoid split-brain so will need to coordinate with @ayounsi or another network engineer.
Change 616172 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move cloudcephmon1003 from .wikimedia.org to .eqiad.wmnet
Change 616172 merged by Andrew Bogott:
[operations/puppet@production] Move cloudcephmon1003 from .wikimedia.org to .eqiad.wmnet
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1003.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007271247_andrew_27892_cloudcephmon1003_eqiad_wmnet.log.
Change 616519 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Ceph: temporarily hack the public network to include both old and new
Change 616519 merged by Andrew Bogott:
[operations/puppet@production] Ceph: temporarily hack the public network to include both old and new
Completed auto-reimage of hosts:
['cloudcephmon1003.eqiad.wmnet']
and were ALL successful.
Change 616520 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Ceph: update ip for cloudcephmon1003
Change 616520 merged by Andrew Bogott:
[operations/puppet@production] Ceph: update ip for cloudcephmon1003
Change 616156 merged by Andrew Bogott:
[operations/puppet@production] Move cloudcephmon1002 from .wikimedia.org to .eqiad.wmnet
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1002.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007271405_andrew_6394_cloudcephmon1002_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1002.eqiad.wmnet']
and were ALL successful.
Change 616157 merged by Andrew Bogott:
[operations/puppet@production] Move cloudcephmon1001 from .wikimedia.org to .eqiad.wmnet
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1001.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007271441_andrew_6728_cloudcephmon1001_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1001.eqiad.wmnet']
and were ALL successful.
Change 616151 merged by Andrew Bogott:
[operations/dns@master] Remove public IP addresses for cloudcephmons nodes
Change 616152 merged by Andrew Bogott:
[operations/dns@master] Remove eth1 addresses for cloudcephmon hosts
There is a new wrinkle here -- lvs!
These hosts are an lvs pool, which means they need to be able to talk to e.g. lvs1015. Currently lvs1015 can't even ping them. Is this a simple ACL change, or a huge wrench in our network plans?
Looking at the LVS config, it seems like it's only configured for Prometheus monitoring. As it's a bit of a surprising setup, is it a hard requirement? Is it possible to know more about it?
If we want to stretch our LVS to a new vlan or realm we would need to have Traffic approval, and I'm not sure we should as it extends the fate sharing for the LVS.
Change 616817 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] Move cloudceph.svc.eqiad.wmnet service name to cloudceph.eqiad.wmnet
Change 616818 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudceph: don't use lvs for prometheus monitoring
I've added a couple of patches that moves this out from behind lvs. A side-effect of that is that the service name cloudceph.svc.eqiad.wmnet will move to cloudceph.eqiad.wmnet. That means we need to update whatever things are currently monitoring the old .svc name. Grafana dashboards, probably?
Change 616818 merged by Andrew Bogott:
[operations/puppet@production] cloudceph: don't use lvs for prometheus monitoring
Change 616852 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Remove lvs from cloudcephmon nodes
Change 616852 merged by Andrew Bogott:
[operations/puppet@production] Remove lvs from cloudcephmon nodes
Change 616817 merged by Andrew Bogott:
[operations/dns@master] Remove cloudceph.svc.eqiad.wmnet service name
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1003.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007281534_andrew_27220_cloudcephmon1003_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1003.eqiad.wmnet']
and were ALL successful.
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1002.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007281609_andrew_31018_cloudcephmon1002_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1002.eqiad.wmnet']
and were ALL successful.
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
cloudcephmon1001.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/202007281633_andrew_19753_cloudcephmon1001_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['cloudcephmon1001.eqiad.wmnet']
and were ALL successful.
cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1001.wikimedia.org
ERROR: some step on some host failed, check the bolded items above
cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1002.wikimedia.org
ERROR: some step on some host failed, check the bolded items above
cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: cloudcephosd1003.wikimedia.org
ERROR: some step on some host failed, check the bolded items above
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
['cloudcephosd1001.eqiad.wmnet', 'cloudcephosd1002.eqiad.wmnet', 'cloudcephosd1003.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/202007281934_andrew_18746.log.
Completed auto-reimage of hosts:
['cloudcephosd1001.eqiad.wmnet', 'cloudcephosd1003.eqiad.wmnet', 'cloudcephosd1002.eqiad.wmnet']
and were ALL successful.
@Andrew is the DNS record:
10.in-addr.arpa:51 1H IN PTR cloudceph.svc.eqiad.wmnet.
a leftover that can be removed?
Change 623843 had a related patch set uploaded (by Volans; owner: Volans):
[operations/dns@master] Cleanup leftover record cloudceph.svc.eqiad.wmnet
Change 623843 merged by Volans:
[operations/dns@master] Cleanup leftover record cloudceph.svc.eqiad.wmnet