Maniphest T203964

Create a spicerack cookbook to empty a ganeti node from VMs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	akosiaris
	Sep 10 2018, 2:49 PM

Description

After having successfully completed T203963, we could build on the stuff created for that task to create a cookbook that empties a node for maintenance

Connects to the ganeti cluster master (figures it out via gnt-cluster getmaster if necessary)
Live migrates all running VMs on said node (gnt-node evacuate -p $node should be good)
Verifies the above has completed successfully
Fails over VMs for which the host is primary but are not running (gnt-node failover)
Moves secondary VMs (if requested) from the host (gnt-node evacuate -s)

A variation of the above that would also be helpful (probably as it's own cookbook) would be:

Do the above to a machine
Reboot it
wait for it to come back online
run gnt-cluster verify-disks to force DBRD pair syncing with the rest of the cluster
Proceed to the next node

Aka rolling reboot

Details

Subject	Repo	Branch	Lines +/-
sre.ganeti.drain-vm: Sync DRBD after reboot	operations/cookbooks	master	+1 -0
sre.ganeti.drain-node: Pass -f to evacuate command	operations/cookbooks	master	+1 -1
Don't reboot Ganeti master nodes	operations/cookbooks	master	+7 -2
Fix migration when "plain" instances are involved	operations/cookbooks	master	+31 -12
sre.ganeti.drain-node: Add the option to reboot the drained node	operations/cookbooks	master	+18 -6
Add a cookbook to drain a Ganeti node	operations/cookbooks	master	+161 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T203943 Spicerack cookbooks TODO list
Resolved	• crusnov	T203963 Convert makevm to spicerack cookbook
Open	None	T283319 Cookbooks for Ganeti maintenance tasks
Resolved	MoritzMuehlenhoff	T203964 Create a spicerack cookbook to empty a ganeti node from VMs

Event Timeline

akosiaris created this task.Sep 10 2018, 2:49 PM

Dzahn triaged this task as Medium priority.Oct 12 2018, 5:46 PM

Dzahn subscribed.

akosiaris updated the task description. (Show Details)Nov 8 2018, 11:36 AM

MoritzMuehlenhoff subscribed.Nov 8 2018, 11:39 AM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 10:21 PM

• crusnov moved this task from Backlog to Up next on the SRE-tools board.Feb 14 2019, 5:37 PM

• crusnov moved this task from Up next to Backlog on the SRE-tools board.

• crusnov subscribed.Apr 9 2019, 6:19 PM

jijiki removed a project: User-jijiki.Sep 8 2020, 10:27 AM

MoritzMuehlenhoff added a parent task: T283319: Cookbooks for Ganeti maintenance tasks.May 21 2021, 7:54 AM

Aklapper added a project: Infrastructure-Foundations.Jun 21 2021, 9:00 PM

joanna_borun added a project: Spicerack.Jun 15 2022, 10:43 AM

MoritzMuehlenhoff added a project: Ganeti.Jun 24 2022, 10:03 AM

MoritzMuehlenhoff claimed this task.May 26 2023, 12:13 PM

Change 924498 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] Add a cookbook to drain a Ganeti node

https://gerrit.wikimedia.org/r/924498

gerritbot added a project: Patch-For-Review.May 30 2023, 1:45 PM

Change 924498 merged by Muehlenhoff:

[operations/cookbooks@master] Add a cookbook to drain a Ganeti node

https://gerrit.wikimedia.org/r/924498

Maintenance_bot removed a project: Patch-For-Review.Jun 21 2023, 12:10 PM

Change 932167 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] sre.ganeti.drain-node: Add the option to reboot the drained node

https://gerrit.wikimedia.org/r/932167

gerritbot added a project: Patch-For-Review.Jun 22 2023, 6:54 AM

Change 932167 merged by Muehlenhoff:

[operations/cookbooks@master] sre.ganeti.drain-node: Add the option to reboot the drained node

https://gerrit.wikimedia.org/r/932167

Maintenance_bot removed a project: Patch-For-Review.Jun 22 2023, 10:12 AM

Change 932237 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] Fix migration when "plain" instances are involved

https://gerrit.wikimedia.org/r/932237

gerritbot added a project: Patch-For-Review.Jun 22 2023, 12:53 PM

Change 932237 merged by Muehlenhoff:

[operations/cookbooks@master] Fix migration when "plain" instances are involved

https://gerrit.wikimedia.org/r/932237

Maintenance_bot removed a project: Patch-For-Review.Jun 27 2023, 2:10 PM

Change 933482 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] Don't reboot Ganeti master nodes

https://gerrit.wikimedia.org/r/933482

gerritbot added a project: Patch-For-Review.Jun 27 2023, 2:45 PM

Change 933482 merged by Muehlenhoff:

[operations/cookbooks@master] Don't reboot Ganeti master nodes

https://gerrit.wikimedia.org/r/933482

Maintenance_bot removed a project: Patch-For-Review.Jun 28 2023, 7:10 AM

Change 933852 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] sre.ganeti.drain-node: Pass -f to evacuate command

https://gerrit.wikimedia.org/r/933852

gerritbot added a project: Patch-For-Review.Jun 28 2023, 7:18 AM

Change 933852 merged by Muehlenhoff:

[operations/cookbooks@master] sre.ganeti.drain-node: Pass -f to evacuate command

https://gerrit.wikimedia.org/r/933852

Maintenance_bot removed a project: Patch-For-Review.Jun 28 2023, 7:30 AM

Change 934248 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] sre.ganeti.drain-vm: Sync DRBD after reboot

https://gerrit.wikimedia.org/r/934248

gerritbot added a project: Patch-For-Review.Jun 29 2023, 7:46 AM

Change 934248 merged by Muehlenhoff:

[operations/cookbooks@master] sre.ganeti.drain-vm: Sync DRBD after reboot

https://gerrit.wikimedia.org/r/934248

Maintenance_bot removed a project: Patch-For-Review.Jun 29 2023, 8:11 AM

This has been implemented with the new sre.ganeti.drain-node cookbook, which I've used for the latest round of reboots.

By default only primary instances are moved away. This can be used for reboots and similar short term maintenance. If a host is going away for a longer time (or if all data will be lost in a reimage), the --full option also moves the secondary instances to other nodes.

By default all Ganeti nodes uses replicate DRBD storage, but for latency-sensitive services (currently only needed by etcd) the overhead of DRBD may cause visible latency issues. These hosts are stored with local disk storage instead (called "plain").

If only primary instances are drained, such instances are ignored (since they are inherently non-redundant). If a node is fully drained, such instances need to be temporarily switched to DRBD using the sre.ganeti.changedisk cookbook first.

With the --reboot option the cookbook also calls the sre.hosts.reboot-single cookbook to directly initiate a reboot. I've also added a sanity check which prevents reboots of the current master node.

Create a spicerack cookbook to empty a ganeti node from VMsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Create a spicerack cookbook to empty a ganeti node from VMs
Closed, ResolvedPublic
Actions

Related Objects
Search...