Page MenuHomePhabricator

Migrate existing ceph workload off of cloudcephosd100[1-3] and rebuild with new vlan config
Closed, ResolvedPublic

Description

Current plan for this is:

  • change three new osd nodes to ADD the public network ip (actually production private) as an additional cluster network (so they can talk to both the new cluster network and the old one) (via local hack, will be reverted by puppet)
  • add those new osd nodes to the existing pool
  • remove old osds one by one, wait for rebuilds
  • rebuild new osds with modern settings
  • remove the public network IP from the cluster network config on our three new osd nodes one at a time because it will restart OSD daemons

Event Timeline

Change 616576 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Ceph: temporarily hack the cluster network to include both old and new

https://gerrit.wikimedia.org/r/616576

Change 616582 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Ceph: make cloudcephosd1004, 5 and 6 osd nodes

https://gerrit.wikimedia.org/r/616582

Change 616582 merged by Andrew Bogott:
[operations/puppet@production] Ceph: make cloudcephosd1004, 5 and 6 osd nodes

https://gerrit.wikimedia.org/r/616582

Change 616584 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Ceph: make cloudcephosd1004, 1005, 1006 into OSD nodes

https://gerrit.wikimedia.org/r/616584

Change 616584 merged by Andrew Bogott:
[operations/puppet@production] Ceph: make cloudcephosd1004, 1005, 1006 into OSD nodes

https://gerrit.wikimedia.org/r/616584

Change 616576 merged by Andrew Bogott:
[operations/puppet@production] Ceph: update network settings

https://gerrit.wikimedia.org/r/616576

This went poorly and existing data was lost. The old pool has now been deleted and the hosts renamed, so this is 'resolved' but a kind of defeated resolved.