Page MenuHomePhabricator

Migrate existing cookbooks related to rolling restarts/reboots to SREBatchBase
Open, In Progress, LowPublic

Description

SREBatchBase is an abstraction to reduce boilerblate in writing cookbooks to reboot clusters or restart daemons of a cluster instead. Currently there are these cookbooks which already use it:

  • sre.gitlab.reboot-runner
  • sre.ldap.roll-restart-reboot-replica
  • sre.k8s.reboot-nodes
  • sre.misc-clusters.roll-restart-reboot-docker-registry
  • sre.misc-clusters.thumbor.py
  • sre.wdqs.restart-nginx
  • sre.o11y.roll-restart-reboot-kibana

But the following cookbooks should eventually be converted to use the new framework:

  • sre.aqs.roll-restart (done)
  • sre.cassandra.roll-restart
  • sre.druid.reboot-workers
  • sre.druid.roll-restart-workers
  • sre.hadoop.reboot-workers
  • sre.hadoop.roll-restart-masters
  • sre.hadoop.roll-restart-workers
  • sre.kafka.reboot-workers
  • sre.kafka.roll-restart-brokers
  • sre.kafka.roll-restart-mirror-maker
  • sre.maps.reboot (done)
  • sre.mediawiki.restart-appservers
  • sre.ores.roll-restart-workers
  • sre.presto.reboot-workers
  • sre.presto.roll-restart-workers
  • sre.wdqs.reboot
  • sre.wdqs.restart
  • sre.zookeeper.roll-restart-zookeeper

Looking at the names already in use, I think we should also agree on a common naming scheme. It seems to me that

  • any reference to "rolling" can go away (it's all quite implicit)
  • some cookbooks handle both reboots and daemon restarts at the same, but some have more complex needs (e.g. Hadoop needs only specific daemons restarted)

So maybe a scheme where sre.X.reboot-restart-foodesignates a cookbook which handles both reboots and daemon restarts and sre.X.bar-reboot and sre.X.bar-restart and sre.X.bar-restart-SERVICE are used for services with more specialised needs?

Event Timeline

Change 957696 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] Extend the maps restart cookbook to also handle reboots

https://gerrit.wikimedia.org/r/957696

Change 957696 merged by Muehlenhoff:

[operations/cookbooks@master] Extend the maps restart cookbook to also handle reboots

https://gerrit.wikimedia.org/r/957696

Change 958478 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] sre.maps.reboot: Retire legacy cookbook

https://gerrit.wikimedia.org/r/958478

Change 958478 merged by Muehlenhoff:

[operations/cookbooks@master] sre.maps.reboot: Retire legacy cookbook

https://gerrit.wikimedia.org/r/958478

I'm taking this one, for coordinationd and partly implementing myself.

MoritzMuehlenhoff changed the task status from Open to In Progress.Jan 29 2024, 4:01 PM
MoritzMuehlenhoff triaged this task as Low priority.