Page MenuHomePhabricator

DB planning: include a writeable (?) misc DB cluster in codfw for WMCS
Closed, DeclinedPublic

Description

In T218029: CloudVPS: evaluate convenience of having codfw openstack DBs in proper DB hosts we agreed on the fact that having a misc DB cluster in codfw would be very good for our use cases.

This task is to formally request this, and track this work.

@jcrespo mentioned this could take some months to accomplish (it involves planning, budgeting, HW procurement, etc), so meanwhile, we will be doing T218569: Openstack codfw DBs: move to m5-master.eqiad.wmnet

Event Timeline

aborrero triaged this task as Medium priority.Mar 18 2019, 1:26 PM
aborrero created this task.

This would be great because, afaict, this would also unblock having a Phabricator (T137928) and Gerrit (T176532) working in codfw. (They are both blocked by lack of misc db proxy in codfw)

That would be good for the greater goal of having redudancy for misc things: T156937.

This would be great because, afaict, this would also unblock having a Phabricator (T137928) and Gerrit (T176532) working in codfw. (They are both blocked by lack of misc db proxy in codfw)

That would be good for the greater goal of having redudancy for misc things: T156937.

I think this request is different from that - see T156937#5032690

I am not working on this.

I am not working on this.

OK, but please do soon :-)

If possible, we would like to know and ETA for this project to be accomplished.

OK, but please do soon :-)

If possible, we would like to know and ETA for this project to be accomplished.

At the moment, I think no DBA has an idea of what you need or want to do, so please explain that first.

At the moment, I think no DBA has an idea of what you need or want to do, so please explain that first.

We would like to have a misc r/w DB cluster m6-master, or whatever is the name, in codfw, for us to use it like we do right now with m5-master. That is, openstack databases (keystone, nova, neutron, and friends), and other stuff (like wikitech, toolsadmin, etc).

This was previously discussed in T218029: CloudVPS: evaluate convenience of having codfw openstack DBs in proper DB hosts. As we are aware this may take some time to accomplish, we will be doing T218569: Openstack codfw DBs: move to m5-master.eqiad.wmnet meanwhile.

Please let me know if you need further explanation.

We should probably rename this task as we already have a misc cluster in codfw (which replicates from eqiad - https://dbtree.wikimedia.org/) and this is an specific _new_ set of misc for Cloud (from what I understand).

We should probably rename this task as we already have a misc cluster in codfw (which replicates from eqiad - https://dbtree.wikimedia.org/) and this is an specific _new_ set of misc for Cloud (from what I understand).

That's probably correct. The existing misc cluster in codfw is (almost always) read-only, and we need a read-write db server someplace in codfw.

mark renamed this task from DB planning: include a misc cluster in codfw to DB planning: include a writeable (?) misc DB cluster in codfw for WMCS.Apr 16 2019, 10:43 AM
mark added a subscriber: mark.
Marostegui mentioned this in Unknown Object (Task).Jan 7 2020, 6:50 AM

Resuming the conversation here from {T242036}

In T242036#5783634, @bd808 wrote:

The discussion has been silent because I was under the impression that the DBA team did not want to talk about it any longer. Our codfw1-dev deployments of both OpenStack + Wikitech in the codfw datacenter is currently using self-hosted mysql instances. We had to do this because there were not any read-write shared database deployments in codfw. If there was such an environment we would gladly use it; we love taking care of fewer things!

From T218569#5057327 and onwards my understanding is that it was already workarounded (T218569#5084741) - whether it is a temporary solution or you are still seeking of long-term solution I don't know.

As discussed previously we do not have writable hosts on codfw, and having a set of writable replicas there (and if they have an equivalent in eqiad) is kinda totally different from our current set up (which is, as you know, eqiad -> codfw (for now). And it is a snowflake for us. And very prone to suffer human errors (from our side).

We are currently handling and maintaining m1-m5 and they all have a "similar" setup and configuration (as in: eqiad RW replicating RO to codfw and that's it) and I wouldn't be comfortable adding another m6 maintained by us (DBAs) with a totally different setup.

If WMCS wants to have their own set of databases, owned and maintained by them, writable in codfw (and might or might not having a read-only pair in eqiad for redundancy) I am fine with them as long as we are not owners/maintainers. Of course, I am happy to help with the HW specs and with the initial replication configuration and all that, but I don't think we (DBA) are in a position now to maintain such a service.

@Marostegui after a lot of internal discussion, the cloud-services-team thinks the easiest and best solution for everyone will be for us to expand our cloudcontrol* cluster to 3 nodes in eqiad and 3 nodes in codfw and host the databases needed for OpenStack management ourselves. @JHedden has operational experience setting up and managing this type of backplane for OpenStack. It will keep our very different workload isolated from the misc db clusters. It will also reduce hassle for everyone as we add additional OpenStack components which need more active database connections. We will be asking to repurpose the budget from {T242036} to cover the codfw expansion and {T242135} for the eqiad expansion.

If you do not have strong objections to this plan I think we should close this task as "declined".

Thanks for the heads up @bd808 - that sounds good to me :-)
Will m5 content be eventually migrated to those new hosts?

Declining this task then (the above question can be answered with this task declined anyways)
Thank you

Will m5 content be eventually migrated to those new hosts?

I think on our side we are referring exclusively to openstack databases: nova* neutron, glance, keystone, etc.

Wikitech, which I assume is running in m5, is outside of the scope of this cloudcontrol-integrated database, which is exclusively for openstack services. I'm not exactly sure what else is running in m5.

Thanks for the answer. This is what we currently have in m5:

+---------------------+
| Database            |
+---------------------+
| designate           |
| glance              |
| heartbeat           |
| information_schema  |
| keystone            |
| labsdbaccounts      |
| labswiki            |
| mysql               |
| neutron             |
| nova                |
| nova_api            |
| nova_api_eqiad1     |
| nova_cell0_eqiad1   |
| nova_eqiad1         |
| ops                 |
| performance_schema  |
| striker             |
| sys                 |
| test_labsdbaccounts |
| testreduce_0715     |
| testreduce_vd       |
+---------------------+

Regarding wikitech, yeah, I was assuming it would remain on m5 for now till T167973: Move database for wikitech (labswiki) to a main cluster section is sorted

Ok, I think we can safely assume we are talking only about openstack databases: designate, glance, keystone, neutron, nova*.

Just to highlight it for discussion (in case it is useful for anyone's later planning): this would mean that striker and labsdbaccounts have no plans at all to move anywhere in the future. Only the openstack databases and then wikitech has T167973: Move database for wikitech (labswiki) to a main cluster section.

Striker is for Toolforge account creation/management and labsdbaccounts is operated by scripts on the NFS clusters to track creation of Cloud user mysql credentials.

I have no idea what these are:

| testreduce_0715     |
| testreduce_vd       |

Thanks @Bstorm for the clarification!
In a medium/long term I would like to re-arch and consolidate stuff in our misc infra, to have a better resource usage than we currently have. We can talk once that time has arrived, but it won't be soon anyways :)