Page MenuHomePhabricator

Move wikireplicas dbproxy haproxy config to etcd
Closed, DeclinedPublic

Description

Currently haproxy config is written using a puppet resource which reads from hiera. This means that when clouddb hosts are to be depooled (for example updating views like https://phabricator.wikimedia.org/T302233), we have to create a puppet patch like this one and apply it. This is manual and makes automating depooling in a cookbook difficult (the first pass at a cookbook tried to use haproxy commands directly which did not work with our haproxy permissions and would make haproxy out of sync with the filesystem).

A better approach would be to put the configuration in etcd and use confd to write the latest etcd values to disk (/etc/haproxy/conf.d/multi-db-replicas.cfg) and reload haproxy when etcd changes. This way etcd can be manually updated which is easy to do programmatically.

Rough outline of steps:

  • copy the hiera data about which servers are depooled in etcd
  • make confd write etcd to disk (we can use a different name, like /etc/haproxy/conf.d/wikireplicas.cfg, so that the current system is undisturbed)
    • this should be configured to watch the key(s) we chose to rewrite the config files
  • when we're happy with the new config file, switch haproxy to read from the new config, and remove the old
  • make haproxy reload when the config file changes
  • test depooling a clouddb server by using etcdctl; once this works update the cookbook to depool using this command

Event Timeline

@Joe I'm interested in your input, since you mentioned etcd is the way to go here - does the above plan make sense?

If so, for etcd data modeling, would it make sense to have a single to have a single keyfor this dictionary-like structure for this profile::mariadb::proxy::multiinstance_replicas::section_overrides or break it into multiple keys and depend on a prefix?

I'll admit I've never used etcd prior to this (though data engineering uses zookeeper which is similar); please excuse my novice-level understanding.

Change 773386 had a related patch set uploaded (by Razzi; author: Razzi):

[operations/puppet@production] [WIP] profile::mariadb::proxy::multiinstance_replicas: haproxy config from etcd

https://gerrit.wikimedia.org/r/773386

EChetty changed the task status from Open to In Progress.Mar 24 2022, 2:54 PM
EChetty assigned this task to razzi.
EChetty moved this task from Next Up to In Progress on the Data-Engineering-Kanban board.
EChetty moved this task from Incoming (new tickets) to Datasets on the Data-Engineering board.

I have opened a patch for this that writes to a config file other than the actual one, so it can be inspected for correctness: https://gerrit.wikimedia.org/r/c/operations/puppet/+/773386

A couple questions remain:

  • How to get the data into confd? This will require adding keys like

conftool/v1/dbproxy-mariadb-hosts-pooled/s7/dbstore1018.eqiad.wmnet

with values like

{"ip": "10.64.0.29", "state": "active"}

A full set of keys and values can be derived from the current puppet-based depooling status.

for every clouddb server / section. The current values are in puppet.

  • Is the key structure that I used workable? Key is <prefix>/dbproxy-mariadb-hosts-pooled/<section>/<fqdn>

Hey sorry I didn't take the time to reply earlier, I was a bit busy.

We already have a structure in etcd to describe pools of backends and what is pooled/not pooled there.

It's the node object hierarchy.

So basically you would have to add something like the following to conftool-data:

# file name: node/dbproxy.yaml
dbproxy-mariadb:
  dbstore1018.eqiad.wmnet: [s7, ...] # any other section you might be interested in.

I think this would be the simplest way to go, but this would only provide you the pooled status for confd use.

I get that's somewhat clunky, as you'd need to get the IP address from another source - like say hiera. It's doable and it's the path of least resistance, but we can also think of adding a new type of objects.

That is done by adding a new class of objects to the conftool schema at ./modules/profile/files/conftool/schema.yaml.

So basically, I'd say this new type of object would need the following:

  • a scope (or another name) tag to indicate which functional group of dbproxies we're aiming at
  • the fields should be as you described, an ip address and a boolean pooled/not pooled

Then, we can have confd on the dbproxies just change the file on disk and notify haproxy to reload it.

Some pointers given we're not working with a big TZ overlap:

By declaring the host -> ip mappings using https://github.com/kelseyhightower/confd/blob/master/docs/templates.md#map earlier in the etcd template, we should be able to keep the data stored in etcd in line with the current node statuses; as simple as "pooled/not pooled". I'd prefer this, since repeating the ip address in etcd is a potential point of confusion.

When I compile the puppet catalog and inspect the template it generates, it gives some strange output; all the databases are clouddb1099 which doesn't exist (https://puppet-compiler.wmflabs.org/pcc-worker1003/34632/dbproxy1018.eqiad.wmnet/change.dbproxy1018.eqiad.wmnet.pson, search for /etc/haproxy/conf.d/multi-db-replicas.cfg), but the current file with which puts the puppet values straight into the haproxy config uses the same values, so I guess these are dummy values for the puppet catalog compiler.

Now I need to update the replicas.tpl.erb to use the standard config schema, but I'm a bit lost on how the keys and values would look in this case. There is a category, host name, and service name, but how to they combine into a key? I'm guessing it would look like /dbproxy-mariadb/clouddb1013.eqiad.wmnet/s1 or /dbproxy-mariadb/s1/clouddb1013.eqiad.wmnet.

Looking at the docs here (https://wikitech.wikimedia.org/wiki/Conftool#The_tools), it seems the values look like:

{"clouddb1013.eqiad.wmnet": {"weight": 1, "pooled": "yes"}, "tags": "dc=eqiad"}

Perhaps first I should get the conftool data onto etcd with conftool-merge by just merging the changes in this file: https://gerrit.wikimedia.org/r/c/operations/puppet/+/773386/10/conftool-data/node/eqiad.yaml and then set and query the etcd keys and values once that's done?

Change 779926 had a related patch set uploaded (by Razzi; author: Razzi):

[operations/puppet@production] dbproxy: add clouddb sections to conftool

https://gerrit.wikimedia.org/r/779926

Change 779941 had a related patch set uploaded (by Razzi; author: Razzi):

[operations/puppet@production] wikireplicas: depool clouddb1017-1020 and repool 15 and 16

https://gerrit.wikimedia.org/r/779941

Change 779941 abandoned by Razzi:

[operations/puppet@production] wikireplicas: depool clouddb1017-1020 and repool 15 and 16

Reason:

Accedentally added a patch on top of an unrelated one

https://gerrit.wikimedia.org/r/779941

Change 779945 had a related patch set uploaded (by Razzi; author: Razzi):

[operations/puppet@production] wikireplicas: depool clouddb1017-1020 and repool 15 and 16

https://gerrit.wikimedia.org/r/779945

Change 779945 merged by Razzi:

[operations/puppet@production] wikireplicas: depool clouddb1017-1020 and repool 15 and 16

https://gerrit.wikimedia.org/r/779945

JArguello-WMF added a subscriber: BTullis.

Change 779926 abandoned by Btullis:

[operations/puppet@production] dbproxy: add clouddb sections to conftool

Reason:

Not proceeding with this approach for now.

https://gerrit.wikimedia.org/r/779926

Thanks @Joe for your input on this ticket. I think that we've decided it's too much work for us to take on at the moment, so I'll decline this ticket.
Maybe we will come back to the question of dynamic configuation of the clouddb proxy layer at a later date. I really like the idea of being able to adjust the proxies at runtime, but it's just not high enough up the priority list at the moment.