Page MenuHomePhabricator

Add database host removal from Orchestrator to sre.hosts.decommission cookbook
Open, Needs TriagePublic

Description

Every time a database host is decommissioned, removal from Orchestrator as documented in https://wikitech.wikimedia.org/w/index.php?title=MariaDB%2FDecommissioning_a_DB_Host#Remove_host_from_orchestrator should happen. This action could potentially be included in the sre.hosts.decommission cookbook. One thing to consider is making sure that the removal happens at the right point in the process so that the host is not re-discovered by Orchestrator.

Event Timeline

How about we create a mechanism similar to the logout.d scripts, but for decom? Let's say we create a new /etc/wikimedia/decom.d directory where each service can (in this case it would be installed on each DB host managed in Orchestrator) drop a decom script with steps which ought to be taken when a host running this service gets decommed. These files would get executed by decom cookbook locally (and would trigger the de-registration on orch1001). This way we can flexibly extend custom decom workflows like this without tieing this a change in the decom cookbook (also also keep it more lean).

@MoritzMuehlenhoff 's proposal is certainly a neat option but I have a couple of worries, namely:

  • it might be hard to find the right moment to run those scripts for every service, without adding a lot of "hooks" during the decommissioning process to run the appropriate ones (some might need to run something before the host is removed from puppet, some after, etc...)
  • some "decom" actions should be performed from a central host instead of the target hosts because maybe we don't want to open some API to all hosts but only to some central hosts (like the way we remove hosts from debmonitor)
  • some "decom" actions should be performed from a central host instead of the target hosts because maybe we don't want to open some API to all hosts but only to some central hosts (like the way we remove hosts from debmonitor)

This task a good example of exactly that.