Page MenuHomePhabricator

Decide how to handle encapi database
Closed, ResolvedPublic


For context see parent ticket where we are talking about moving the labs central puppetmaster into the realm.
The labs central puppetmaster runs a service called encapi. encapi is the backend for the hiera data and puppet classes you see in Horizon.
Horizon connects to encapi, stuff gets stored in the database - currently on the prod m5 DB cluster - and puppetmasters are configured to run /usr/local/bin/puppet-enc provided by openstack::puppet::master::enc, which contacts encapi to pull the relevant data for a particular node.
encapi source: modules/openstack/files/puppet/master/encapi/
But, we cannot just take m5 and stick it inside labs. So what do we do instead?

Event Timeline

I'm going to assume we really need a replica of this data on a separate instance that lives on a separate host.
That leaves some questions though - should the master live in the same instance as the central puppetmaster? Should it be a separate instance in the cloudinfra project? What about clouddb-services?
Edit: Sounds like clouddb-services may only be for tenants to do stuff with like tools, it might not be appropriate for this.

I think I would vote for separate instance(s) in the cloudinfra project. The clouddb-services project is currently focused on shared datasets that are exposed to other Cloud VPS projects (ToolsDB, maps db, wikilabels db). The puppet/hira config that is managed though the encapi has a different usage pattern. I also think it will be a bit easier to troubleshoot problems if the data paths don't cross multiple Cloud VPS projects.

clouddb-services should be separate +1

This should be an instance inside cloud-infra or the instance should access m5 directly.

or the instance should access m5 directly.

I don't think that's possible. Those DB hosts are unlikely to be exposed to Cloud VPS.

clouddb-services should be separate +1

This should be an instance inside cloud-infra or the instance should access m5 directly.

That's fine with me, as long as we ensure that they're redundant and spread out over multiple virt hosts.

So are we all agreed then to have two DB instances within cloudinfra on separate virt hosts? Anyone know how big the current m5 labspuppet DB is? I expect it's not huge.

I don't know enough about mysql data structures to be sure I'm looking at the right thing, but I think 'not huge' is accurate:

root@db1073:/srv/sqldata# du -h labspuppet/
736K labspuppet/

Hm... I bet the actual data is elsewhere though.

It is tiny:

(labspuppet@m5-master.eqiad.wmnet) [information_schema]> select table_schema, SUM( data_length + index_length ) as total_bytes, SUM( table_rows ) as row_count, COUNT(1) as tables FROM information_schema.TABLES WHERE table_schema='labspuppet' GROUP BY table_schema ORDER BY total_bytes DESC;
| table_schema | total_bytes | row_count | tables |
| labspuppet   |      409600 |      1546 |      3 |
1 row in set (0.00 sec)

Plan will be to make 2x m1.small on separate hosts within cloudinfra, one being the master and the other a simple replica. Thanks all.

I created the two instances as cloudinfra-db0[12], set up a puppetmaster as cloudinfra-internal-puppetmaster01 which will keep their secrets within the project, and uploaded

Do we want this stuff under mariadb profile or under wmcs more like modules/profile/manifests/wmcs/services/toolsdb_primary.pp (honest question)? Like modules/profile/wmcs/cloudinfra/ or something?

Also, do we really need ferm in addition to the secgroup blocks? The answer can be yes, but it does add additional complexity that seems unnecessary unless we are doing something secgroups cannot do with it.

I know I'm very late to the game on this. I'm just curious. If we think what I'm suggesting is good, I can shuffle things.