Page MenuHomePhabricator

Add m* and es4/es5 sections to Orchestrator
Closed, ResolvedPublic

Description

Let's add the misc section in orchestrator.

This requires first:

  • Adding the grants to all the hosts.
  • Cleaning up the heartbeat table T268336
  • Enable report_host on all the hosts T266483

Progress:

  • m1
  • m2
  • m3
  • m5
  • es4
  • es5
NOTE: es1, es2 and es3 do not have replication enabled so we can skip them from orchestrator or simply add them as standalone

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui renamed this task from Add m* sections to Orchestrator to Add m* and es4/es5 sections to Orchestrator.Jan 28 2021, 10:58 AM
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: Kormat.

We need to decide what to do with hosts on es1, es2 and es3. They do not have replication enabled so we need to decide if we want them in orchestrator as standalone or not.
At the moment they are on tendril but not on the tree view (cause there is no tree to represent).
My personal opinion is to include them on orchestrator even if they are not replicating from each other - @Kormat thoughts?

My personal opinion is to include them on orchestrator even if they are not replicating from each other - @Kormat thoughts?

👍 from me.

Sounds good, I will clean up the heartbeat table there. It needs re-creation with the newer one schema, as the "shard" section doesn't exist on that one. Once that is done, hopefully orchestrator will group them by the same alias, but just not draw their replication relationship.

So looks like that even if the "alias" query works for all the three hosts, just the first one that gets discovered is the one that gets placed on es1 cluster, the rest are just their own cluster, which is not ideal as we'd have 18 hosts (3 hosts per dc on es1, es2 and es3) as their own clusters which can be a bit messy on the UI.
Going to forget those instances until we decide a work around for this.

There is also the fact that they will always show as lagged as pt-heartbeat entry doesn't run there...if they are downtimed forerever then the lag won't show up, but still not ideal either.
Standalone hosts aren't really thought to be on orchestrator.

Change 660772 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool es4 from writes

https://gerrit.wikimedia.org/r/660772

Change 660772 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool es4 from writes

https://gerrit.wikimedia.org/r/660772

Marostegui updated the task description. (Show Details)

m2 added to orchestrator.
This is all done