Page MenuHomePhabricator

Populating orchestrator alias metadata on a per-server basis
Closed, ResolvedPublic

Description

In order to have cluster aliases rather than clusters based on hostnames, we'd need to populate a table on each master (although for consistency) it should on all hosts.
So orchestrator can identify the cluster when querying the master.

Example:

db1083 -> s1
pc1007 -> pc1

From the doc (https://github.com/openark/orchestrator/blob/master/docs/deployment.md#populating-meta-data):

Populating meta data
orchestrator extracts some metadata from servers:

What's the alias for the cluster this instance belongs to?
What's the data center a server belongs to?
Is semi-sync enforced on this server?
These details are extracted by queries such as:

DetectClusterAliasQuery
DetectClusterDomainQuery
DetectDataCenterQuery
DetectSemiSyncEnforcedQuery
or by regular expressions acting on the hostnames:

DataCenterPattern
PhysicalEnvironmentPattern
Queries can be satisfied by injecting data into metadata tables on your master. For example, you may:

CREATE TABLE IF NOT EXISTS cluster (
  anchor TINYINT NOT NULL,
  cluster_name VARCHAR(128) CHARSET ascii NOT NULL DEFAULT '',
  PRIMARY KEY (anchor)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and populate this table, with, say 1, my_cluster_name, coupled with:

{
  "DetectClusterAliasQuery": "select cluster_name from meta.cluster where anchor=1"
}
Please note orchestrator does not create such tables nor does it populate them. You will need to create the table, populate them, and let orchestrator know how to query the data.

Obviously, we'd need some sort of cronjob/puppet to populate or verify this data is up-to-date on all hosts.
Maybe we can populate those tables from zarcillo data periodically on the master and let it replicate (frequency to be decided)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Oct 26, 3:58 PM
Marostegui triaged this task as Medium priority.Mon, Oct 26, 3:58 PM
Marostegui moved this task from Triage to Refine on the DBA board.

Maybe this can be placed on the ops database already.
We'd need to deploy the following grants everywhere:

GRANT SELECT ON ops.cluster TO 'orchestrator'@'orc_host';

Change 636617 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] orchestrator.conf: Add query to detect alias

https://gerrit.wikimedia.org/r/636617

We might be able to re-use the heartbeat table for this.

select shard from heartbeat.heartbeat order by ts desc limit 1
Marostegui added a comment.EditedTue, Oct 27, 10:01 AM

Forgetting the existing hosts:

root@dborch1001:~# orchestrator-client -c forget-cluster -alias pc1007
root@dborch1001:~#

And cleaning up the aliases related tables

root@db2093.codfw.wmnet[orchestrator]> select * from cluster_alias; select * from cluster_alias_override;
Empty set (0.031 sec)

Empty set (0.031 sec)

root@db2093.codfw.wmnet[orchestrator]>

The following grants were added to pc1007:

GRANT SELECT ON `heartbeat`.* TO 'orchestrator'@'10.64.32.13';

Let's try to discover it:

root@dborch1001:~# orchestrator -c discover -i 10.64.0.180 --debug cli
2020-10-27 09:54:52 INFO starting orchestrator, version: 3.2.3, git commit: 7e183c77882bab9c2bf39804328a3409f5ae8ab3
2020-10-27 09:54:52 INFO Read config: /etc/orchestrator.conf.json
2020-10-27 09:54:52 DEBUG Parsed orchestrator credentials from /etc/mysql/orchestrator_srv.cnf
2020-10-27 09:54:52 DEBUG Parsed topology credentials from /etc/mysql/orchestrator_topo.cnf
2020-10-27 09:54:52 DEBUG Hostname unresolved yet: 10.64.0.180
2020-10-27 09:54:52 DEBUG Cache hostname resolve 10.64.0.180 as 10.64.0.180
2020-10-27 09:54:52 DEBUG Connected to orchestrator backend: orchestrator_srv:?@tcp(db2093.codfw.wmnet:3306)/orchestrator?timeout=1s
2020-10-27 09:54:52 DEBUG Orchestrator pool SetMaxOpenConns: 128
2020-10-27 09:54:52 DEBUG Initializing orchestrator
2020-10-27 09:54:52 INFO Connecting to backend db2093.codfw.wmnet:3306: maxConnections: 128, maxIdleConns: 32
2020-10-27 09:54:52 DEBUG Hostname unresolved yet: pc2007.codfw.wmnet
2020-10-27 09:54:52 DEBUG Cache hostname resolve pc2007.codfw.wmnet as pc2007.codfw.wmnet
2020-10-27 09:54:52 DEBUG Hostname unresolved yet: pc2007.codfw.wmnet
2020-10-27 09:54:52 DEBUG Cache hostname resolve pc2007.codfw.wmnet as pc2007.codfw.wmnet
2020-10-27 09:54:52 DEBUG Hostname unresolved yet: 10.64.48.174
2020-10-27 09:54:52 DEBUG Cache hostname resolve 10.64.48.174 as 10.64.48.174
2020-10-27 09:54:52 DEBUG Hostname unresolved yet: 10.192.0.104
2020-10-27 09:54:52 DEBUG Cache hostname resolve 10.192.0.104 as 10.192.0.104
pc1007:3306

The topology is discovered and the alias works (the toplogy gets printed a bit weirdly as we have to co-masters (replication is enabled eqiad <-> codfw):

root@dborch1001:~# orchestrator -c topology -i pc1007
2020-10-27 09:57:01 DEBUG Hostname unresolved yet: pc1007
2020-10-27 09:57:01 DEBUG Cache hostname resolve pc1007 as pc1007
2020-10-27 09:57:01 DEBUG Connected to orchestrator backend: orchestrator_srv:?@tcp(db2093.codfw.wmnet:3306)/orchestrator?timeout=1s
2020-10-27 09:57:01 DEBUG Orchestrator pool SetMaxOpenConns: 128
2020-10-27 09:57:01 DEBUG Initializing orchestrator
2020-10-27 09:57:01 INFO Connecting to backend db2093.codfw.wmnet:3306: maxConnections: 128, maxIdleConns: 32
2020-10-27 09:57:01 DEBUG instanceKey: pc1007:3306
2020-10-27 09:57:01 DEBUG instanceKey: pc1010:3306
2020-10-27 09:57:01 DEBUG instanceKey: pc2007:3306
2020-10-27 09:57:01 DEBUG instanceKey: pc2010:3306
+ pc1007:3306   [0s,ok,10.4.14-MariaDB-log,rw,STATEMENT,>>]
  + pc1010:3306 [0s,ok,10.4.14-MariaDB-log,rw,STATEMENT,>>,GTID]
+ pc2007:3306   [0s,ok,10.4.12-MariaDB-log,rw,STATEMENT,>>]
  + pc2010:3306 [11s,ok,10.4.12-MariaDB-log,rw,STATEMENT,>>,GTID]
root@db2093.codfw.wmnet[orchestrator]> select * from cluster_alias;
+--------------+-------+---------------------+
| cluster_name | alias | last_registered     |
+--------------+-------+---------------------+
| pc1007:3306  | pc1   | 2020-10-27 09:55:11 |
+--------------+-------+---------------------+
1 row in set (0.031 sec)

Before closing this we need to:

Change 636617 merged by Marostegui:
[operations/puppet@production] orchestrator.conf: Add query to detect alias

https://gerrit.wikimedia.org/r/636617

Mentioned in SAL (#wikimedia-operations) [2020-10-29T12:44:31Z] <marostegui> Deploy grants for cluster alias on pc1 T266485

Mentioned in SAL (#wikimedia-operations) [2020-10-29T12:55:24Z] <marostegui> Deploy orchestrator grants on pc2 T266485

Mentioned in SAL (#wikimedia-operations) [2020-10-29T12:56:43Z] <marostegui> Make orchestrator discover pc2 T266485

Marostegui closed this task as Resolved.Thu, Oct 29, 1:01 PM
Marostegui claimed this task.

pc2 has been discovered and the alias has been set correctly:

root@dborch1001:~# orchestrator -c discover -i 10.64.16.20 --debug cli
2020-10-29 12:56:24 INFO starting orchestrator, version: 3.2.3, git commit: 7e183c77882bab9c2bf39804328a3409f5ae8ab3
2020-10-29 12:56:24 INFO Read config: /etc/orchestrator.conf.json
2020-10-29 12:56:24 DEBUG Parsed orchestrator credentials from /etc/mysql/orchestrator_srv.cnf
2020-10-29 12:56:24 DEBUG Parsed topology credentials from /etc/mysql/orchestrator_topo.cnf
2020-10-29 12:56:24 DEBUG Hostname unresolved yet: 10.64.16.20
2020-10-29 12:56:24 DEBUG Cache hostname resolve 10.64.16.20 as 10.64.16.20
2020-10-29 12:56:24 DEBUG Connected to orchestrator backend: orchestrator_srv:?@tcp(db2093.codfw.wmnet:3306)/orchestrator?timeout=1s
2020-10-29 12:56:24 DEBUG Orchestrator pool SetMaxOpenConns: 128
2020-10-29 12:56:24 DEBUG Initializing orchestrator
2020-10-29 12:56:24 INFO Connecting to backend db2093.codfw.wmnet:3306: maxConnections: 128, maxIdleConns: 32
2020-10-29 12:56:24 DEBUG Hostname unresolved yet: 10.192.16.35
2020-10-29 12:56:24 DEBUG Cache hostname resolve 10.192.16.35 as 10.192.16.35
pc1008:3306
root@dborch1001:~# orchestrator -c clusters-alias
2020-10-29 13:00:00 DEBUG Connected to orchestrator backend: orchestrator_srv:?@tcp(db2093.codfw.wmnet:3306)/orchestrator?timeout=1s
2020-10-29 13:00:00 DEBUG Orchestrator pool SetMaxOpenConns: 128
2020-10-29 13:00:00 DEBUG Initializing orchestrator
2020-10-29 13:00:00 INFO Connecting to backend db2093.codfw.wmnet:3306: maxConnections: 128, maxIdleConns: 32
pc1007:3306	pc1
pc1008:3306	pc2

For now, metadata discovery for DC is tracked at T266635

Closing this one!

Marostegui renamed this task from Populating orchestrator metadata on a per-server basis to Populating orchestrator alias metadata on a per-server basis.Thu, Oct 29, 1:02 PM