Page MenuHomePhabricator

#dbctl: add 'comment'/'description' metadata to instances
Closed, ResolvedPublic

Description

The traditional PHP configs include a lot of metadata about each DB server host:

		'db1067' => 0,      # C6 2.8TB 160GB, # master
		'db1080' => 200,    # A2 3.6TB 512GB, api
		'db1083' => 500,    # B1 3.6TB 512GB # candidate master
		'db1089' => 500,    # C3 3.6TB 512GB
		'db1099:3311' => 50, # B2 3.6TB 512GB # rc, log: s1 and s8

While this needs some design work -- and ideally some design work that involves not repeating ourselves between dbctl and the Zarcillo database! -- we probably at least want a 'description' field as part of the instance schema, and possibly a 'candidate master' boolean as well? Tagging DBA for discussion of what the UI should be.

Event Timeline

CDanis updated the task description. (Show Details)

At the moment, on the zarcillo database we include the table masters which contains the master for each section of each dc:

root@db1115.eqiad.wmnet[zarcillo]> select * from masters limit 2;
+---------+-------+----------+
| section | dc    | instance |
+---------+-------+----------+
| s6      | eqiad | db1061   |
| s7      | eqiad | db1062   |
+---------+-------+----------+
2 rows in set (0.00 sec)

Something do consider very important to have is, as you mentioned, whether a host is a candidate. We currently don't have that in zarcillo and I am not sure zarcillo is the right place for it, it might be easier to be able to have that on the UI itself.

On the servers table we do include the rack, which is something very useful for us and should not be duplicated:

root@db1115.eqiad.wmnet[zarcillo]> select * from servers limit 2;
+--------------------+----------+-------+------+-----------+------+---------------------+
| fqdn               | hostname | dc    | rack | ipv4      | ipv6 | last_boot           |
+--------------------+----------+-------+------+-----------+------+---------------------+
| db1061.eqiad.wmnet | db1061   | eqiad | C3   | 171974883 | NULL | 2018-09-17 17:32:55 |
| db1062.eqiad.wmnet | db1062   | eqiad | D4   | 171978767 | NULL | 2018-09-17 08:32:00 |
+--------------------+----------+-------+------+-----------+------+---------------------+
2 rows in set (0.00 sec)

The HW description is probably not needed anymore, as we in the process of not having any server which is not 512GB, we are decommissioning the rest.

Lastly, we also tend to include the reason of why a host is depooled, ie: "HW issues #TXXXX".

OK, for now I'm planning on changing the instance schema in the following way:

  • Add a candidate_master field, type boolean, default false. No extra code or feature support, aside from it being settable in instance FOO edit and appearing in instance (FOO|all) get output (where you could use jq to find candidate masters for a given section).
  • Add a note field, type string, default empty. Will be editable in instance FOO edit and will appear in instance (FOO|all) get output, but I will also add --set-note NOTE and --clear-note options to instance pool/depool/set-weight, so that it's easy to do stuff like:
    • instance db9999 depool --set-note 'degraded RAID T123456'
    • instance db9999 pool --percentage 10 --set-note 'warmup'
    • instance db9999 pool -p 100 --clear-note

@Marostegui does that sound good to you?

OK, for now I'm planning on changing the instance schema in the following way:

  • Add a candidate_master field, type boolean, default false. No extra code or feature support, aside from it being settable in instance FOO edit and appearing in instance (FOO|all) get output (where you could use jq to find candidate masters for a given section).

Not sure if I get that right, will that show up on dbctl config get? Would it be hard to do a: dbctl config get --section s1 --candidate or similar to retrieve the list of candidate masters for a given section? Not saying it has to be implemented _now_ just asking if it would be something doable or fully breaks the data schema :-)

Will they appear on the eqiad/codfw.json files?

  • Add a note field, type string, default empty. Will be editable in instance FOO edit and will appear in instance (FOO|all) get output, but I will also add --set-note NOTE and --clear-note options to instance pool/depool/set-weight, so that it's easy to do stuff like:
    • instance db9999 depool --set-note 'degraded RAID T123456'
    • instance db9999 pool --percentage 10 --set-note 'warmup'
    • instance db9999 pool -p 100 --clear-note

Will those comments show up on the eqiad/codfw.json files?

Thanks for working on this!

The way to get notes and candidate info would be the dbctl instance all get command.

To find candidate masters then one could do e.g. dbctl instance all get | jq 'select(.. | .sections? | has("s1")) | select(.. | .candidate_master?)' to find all the candidate masters for s1.
Hosts with notes attached could be found with dbctl instance all get | jq 'select(.. | .note? | length > 0)'

We could later add some basic filtering/search support in dbctl itself; maybe something like dbctl instance search --section s1 --candidate. Wouldn't be too hard.

dbctl config get and the eqiad/codfw.json files are both the config as read by Mediawiki, which doesn't need this information, and so I hadn't planned on adding anything there. But, if it would be helpful, there's no reason why we couldn't add some extra fields to the generated config, which would then just be ignored by Mediawiki. If we did this, they'd show up in dbctl config get and in codfw/eqiad.json. We would have to make separate top-level keys in the output, so something like this:

{
  "sectionLoads": { ... },
  "readOnlyBySection": { ... },
  "groupLoadsBySection": { ... },
  "_candidateMastersBySection": { "s1": [ "db999", "db888", ...], ... },
  "_notesByInstance": { "db777": "hw issues Txxxxx", ... }
}

Let me know if you think including this data in the dbctl config get/eqiad.json output this way would be helpful.

The way to get notes and candidate info would be the dbctl instance all get command.

To find candidate masters then one could do e.g. dbctl instance all get | jq 'select(.. | .sections? | has("s1")) | select(.. | .candidate_master?)' to find all the candidate masters for s1.
Hosts with notes attached could be found with dbctl instance all get | jq 'select(.. | .note? | length > 0)'

We should definitely include all these commands in the config. They are very useful (and hard to remember!). I was thinking about doing some wrappers and get them into the software/debtools repo anyways.

We could later add some basic filtering/search support in dbctl itself; maybe something like dbctl instance search --section s1 --candidate. Wouldn't be too hard.

This can be very useful, specially during emergencies.

dbctl config get and the eqiad/codfw.json files are both the config as read by Mediawiki, which doesn't need this information, and so I hadn't planned on adding anything there. But, if it would be helpful, there's no reason why we couldn't add some extra fields to the generated config, which would then just be ignored by Mediawiki. If we did this, they'd show up in dbctl config get and in codfw/eqiad.json. We would have to make separate top-level keys in the output, so something like this:

{
  "sectionLoads": { ... },
  "readOnlyBySection": { ... },
  "groupLoadsBySection": { ... },
  "_candidateMastersBySection": { "s1": [ "db999", "db888", ...], ... },
  "_notesByInstance": { "db777": "hw issues Txxxxx", ... }
}

Let me know if you think including this data in the dbctl config get/eqiad.json output this way would be helpful.

I think it would be useful, but not necessarily something that needs to have lots of priority. If we do some wrappers with `dbctl instance all get | jq 'select(.. | .note? | length > 0)', we could get that info from the CLI for now.

Thanks a lot for evaluating and changing and adapting the UI as we go. I _really_ appreciate it.

Change 529396 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/software/conftool@master] dbctl: add note & candidate_master fields

https://gerrit.wikimedia.org/r/529396

I'm ok with the proposed UI but I'm wondering if we could reduce the duplication for the DBAs to set both a commit message and a note message when very often they will be the same.
The problem is that the note is set on the instance and the commit message only later when actually committing.
Also it would be hard to avoid to override pre-existing notes and to clean up when a host is back in normal shape.
So probably for now is just better to keep it simple at the cost of some duplication in some cases.

I'm ok with the proposed UI but I'm wondering if we could reduce the duplication for the DBAs to set both a commit message and a note message when very often they will be the same.
The problem is that the note is set on the instance and the commit message only later when actually committing.
Also it would be hard to avoid to override pre-existing notes and to clean up when a host is back in normal shape.
So probably for now is just better to keep it simple at the cost of some duplication in some cases.

This was on my mind as well, but I didn't come up with anything that seemed like a good solution.

The messages will usually be short so I think a little duplication is okay. Plus there's the possibility for users to just set shell variables like NOTE="db123 disk fail Txxxxx" for simple reuse.

Change 529396 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: add note & candidate_master fields

https://gerrit.wikimedia.org/r/529396

Change 529396 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: add note & candidate_master fields

https://gerrit.wikimedia.org/r/529396

I saw this was merged - thanks!
From that commit message there is still some work to be done, do you have a rough ETA on when we'll be able to start using those fields?
Thank you!

I saw this was merged - thanks!
From that commit message there is still some work to be done, do you have a rough ETA on when we'll be able to start using those fields?
Thank you!

The work left to be done:

  • Add a nice UI for editing the notes -- with the code as-is you can only edit them using dbctl instance FOO edit. This will take some refactoring which I haven't gotten around to yet.
  • Do a new conftool/dbctl release and a schema upgrade (which fortunately is not hard in this case).

I had been thinking I'd release the changes for this and also T229686: #dbctl: manage 'externalLoads' data at the same time, but I won't be done with that before going on vacation. So how about I release what exists of the notes functionality on Monday?

I had been thinking I'd release the changes for this and also T229686: #dbctl: manage 'externalLoads' data at the same time, but I won't be done with that before going on vacation. So how about I release what exists of the notes functionality on Monday?

That sounds good I think, so we can at least edit the candidate masters in eqiad/codfw even with the instance edit command for now.
Thank you!

Change 534819 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/software/conftool@master] dbctl: add set-candidate-master subcommand on instance

https://gerrit.wikimedia.org/r/534819

Change 534899 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/software/conftool@master] dbctl: add set-note instance subcommand

https://gerrit.wikimedia.org/r/534899

I wound up implementing a simpler UI for this, as the code was a lot less annoying than adding a --set-note flag to each subcommand.

Change 534819 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: add set-candidate-master subcommand on instance

https://gerrit.wikimedia.org/r/534819

Change 534899 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: add set-note instance subcommand

https://gerrit.wikimedia.org/r/534899

Mentioned in SAL (#wikimedia-operations) [2019-09-26T15:00:02Z] <cdanis> dbctl schema migration done T229677

CDanis claimed this task.

released