Page MenuHomePhabricator

orchestrator: integrate promotion rules into puppet
Open, MediumPublic

Description

https://github.com/openark/orchestrator/blob/master/docs/deployment.md#adding-promotion-rules

Adding promotion rules
Some servers are better candidate for promotion in the event of failovers. Some servers aren't good picks. Examples:

A server has weaker hardware configuration. You prefer to not promote it.
A server is in a remote data center and you don't want to promote it.
A server is used as your backup source and has LVM snapshots open at all times. You don't want to promote it.
A server has a good setup and is ideal as candidate. You prefer to promote it.
A server is OK, and you don't have any particular opinion.
You will announce your preference for a given server to orchestrator in the following way:

orchestrator -c register-candidate -i ${::fqdn} --promotion-rule ${promotion_rule}
Supported promotion rules are:

  • prefer
  • neutral
  • prefer_not
  • must_not

Promotion rules expire after an hour. That's the dynamic nature of orchestrator. You will want to setup a cron job that will announce the promotion rule for a server:

*/2 * * * * root "/usr/bin/perl -le 'sleep rand 10' && /usr/bin/orchestrator-client -c register-candidate -i this.hostname.com --promotion-rule prefer"
This setup comes from production environments. The cron entries get updated by puppet to reflect the appropriate promotion_rule. A server may have prefer at this time, and prefer_not in 5 minutes from now. Integrate your own service discovery method, your own scripting, to provide with your up-to-date promotion-rule.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone

Event Timeline

Marostegui moved this task from Backlog to Acknowledged on the SRE board.
Marostegui subscribed.

It is especially important to specify hosts that should never be masters

To refine a bit on this.
We can probably set rules as:

prefer: Pick the already designed candidate masters
prefer_not: Sanitarium masters (as they run ROW)
must_not: Hosts in a different DC, sanitarium hosts, clouddbhosts, all multi-instance hosts.

This task isn't urgent though as we are far from having orchestrator handling master failovers.