Page MenuHomePhabricator

Auto detect DC on orchestrator UI
Closed, ResolvedPublic

Description

Orchestrator has 2 ways of detecting datacenters (https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-classifying.md)
Either via a mysql query using DetectDataCenterQuery or using regex on the .conf file with DataCenterPattern or
While detecting the section (T266485#6580884) is "easy" using the heartbeat table.

Detecting the DC isn't easy, as the heartbeat table doesn't contain the DC where the host lives in but rather, where the pt-heartbeat comes from, which is not always easy as there are two heartbeats.
So it might be easier to use DataCenterPattern regex for that:

Data center
orchestrator is data-center aware. Not only will it color them nicely on the web interface; but it will take DC into consideration when running failovers.

You will configure data center awareness in one of two methods (https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-classifying.md):

DataCenterPattern: a regular expression to be used on the fqdn. e.g.: "db-.*?-.*?[.](.*?)[.].myservice[.]com"
DetectDataCenterQuery: a query that returns the data center name

In this case we'd need to match:
es1, db1, pc1 for eqiad
es2,db2, pc2 for codfw

Event Timeline

Marostegui moved this task from Triage to Ready on the DBA board.

For the record, so far looks like we are going for this approachf:

  • Setting the unused galera variable wsrep_cluster_name on my.cnf via puppet with "eqiad" or "codfw".
  • As this variable is dynamic, we can set it up on the fly (and test it).
  • Use the DetectDataCenterQuery orchestrator flag with something like: DetectDataCenterQuery: select @@wsrep_cluster_name

For the record, so far looks like we are going for this approachf:

  • Setting the unused galera variable wsrep_cluster_name on my.cnf via puppet with "eqiad" or "codfw".
  • As this variable is dynamic, we can set it up on the fly (and test it).
  • Use the DetectDataCenterQuery orchestrator flag with something like: DetectDataCenterQuery: select @@wsrep_cluster_name

I have tried this approach manually by setting the following on orchestrator.conf:

"DetectDataCenterQuery": "SELECT @@wsrep_cluster_name"

I have manually changed the flag on pc1 hosts:

# ./section pc1 | while read host port; do echo $host; mysql.py -h$host:$port -e "select @@wsrep_cluster_name";done
pc2010.codfw.wmnet
@@wsrep_cluster_name
codfw
pc2007.codfw.wmnet
@@wsrep_cluster_name
codfw
pc1010.eqiad.wmnet
@@wsrep_cluster_name
eqiad
pc1007.eqiad.wmnet
@@wsrep_cluster_name
eqiad

After restarting orchestrator this worked like a charm:

Captura de pantalla 2020-10-30 a las 9.35.34.png (354×1 px, 41 KB)

So the pending part would be to puppetize, somehow, setting that variable on my.cnf depending on where the host lives.

So the pending part would be to puppetize, somehow, setting that variable on my.cnf depending on where the host lives.

This bit is easy - puppet has a $site variable we can use.

Change 637702 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] mariadb: (Ab)use wsrep_cluster_name for DC name

https://gerrit.wikimedia.org/r/637702

Change 637715 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] orchestrator.conf: Add DetectDataCenterQuery to detect DC

https://gerrit.wikimedia.org/r/637715

Change 637702 merged by Kormat:
[operations/puppet@production] mariadb: (Ab)use wsrep_cluster_name for DC name

https://gerrit.wikimedia.org/r/637702

Change 637715 merged by Marostegui:
[operations/puppet@production] orchestrator.conf: Add DetectDataCenterQuery to detect DC

https://gerrit.wikimedia.org/r/637715

Running this to enact the change in codfw:

for i in $(mysql.py -h db1115 -A zarcillo -BN -e "select name from instances where server like '%codfw%'"); do
  echo "====> $i"
  mysql.py -h $i -e "set session sql_log_bin=0; set global wsrep_cluster_name=codfw" || break
done

The equivalent for eqiad is now done, too.

Kormat claimed this task.

DC detection is now working \o/

Excellent! pc1 and pc2 are now showing correct DCs.
db1077 picked the change automatically and pc2 did as well, as I never touched those set of hosts.
Thanks!