Page MenuHomePhabricator

Openstack codfw DBs: move to m5-master.eqiad.wmnet
Closed, DeclinedPublic

Description

In T218029: CloudVPS: evaluate convenience of having codfw openstack DBs in proper DB hosts we agreed on the convenience of having the codfw openstack databases in proper DB servers.
We are facing now the decommission of the server currently running this database T218021: decommission: labtestcontrol2001.wikimedia.org, so we need to rescue/reallocate the database previous to the decommission (so we don't have to bootstrap again the deployment).

Ideally, we would have a proper DB server in codfw, but that is not possible right now in the short term, so better use an intermediate solution by now.

This task is for tracking this work.

For the record, the databases that needs reallocation from labtestcontrol2001.wikimedia.org are:

+------------------------+
| Database               |
+------------------------+
| designate              |
| designate_pool_manager |
| glance                 |
| keystone               |
| labspuppet             |
| nova                   |
| nova_api               |
| pdns                   |
+------------------------+

Also, worth remembering that there some databases with the same names already in m5-master, so something must be done to prevent name clashing.

Also, GRANTS and other ACLs should be updated in the destination DB server to allow contact from:

  • labtestcontrol2003.wikimedia.org (which will be renamed and reimaged at some point)
  • cloudcontrol2001-dev.codfw.wmnet (T214448 - spare by now)
  • probably others, but less important, we will discover them as we do more decommissions

Event Timeline

aborrero created this task.

I am not working on this (and cannot be part of your Kanban or planning because I am not a member of your team).

I am removing the DBA tag now as there is no actionable for us now (from my understanding). If someone feels we should still be tagged, please re-add it.

Ok, just for clarification. Is it OK if I do the mysqldump and reallocate the databases to m5-master myself?

If I do this myself, some assistance would be welcome, to make sure I don't "press the wrong button" in m5-master.

aborrero: I think we have a misunderstanding, last time we talked, you wanted to set labtestwiki as rw on codfw. It may not be ok for you to do this, as it may generate lag for m5 services and requires previous backups. It also may require grant changes that have to be puppetized, as well as the setup of backups.

I assume you will need to rename the databases as per your own comment:

Also, worth remembering that there some databases with the same names already in m5-master, so something must be done to prevent name clashing.

I would suggest you share an action plan with us so we can review it together

aborrero: I think we have a misunderstanding, last time we talked, you wanted to set labtestwiki as rw on codfw. It may not be ok for you to do this, as it may generate lag for m5 services and requires previous backups. It also may require grant changes that have to be puppetized, as well as the setup of backups.

We do want a r/w DB for misc stuff in codfw (m6-master??). That is tracked in T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS

We agreed T218570 can't be done in the short term, so we agreed that a workaround was to reallocate the databases to m5-master meanwhile?

I would suggest you share an action plan with us so we can review it together

Ok, will do soon in this same ticket.

so we agreed that a workaround was to reallocate the databases to m5-master meanwhile

That is the part I wasn't aware, probably my fault -not blaming you-, but I don't know about those plans, and that is confusing, at least for me.

so we agreed that a workaround was to reallocate the databases to m5-master meanwhile

That is the part I wasn't aware, probably my fault -not blaming you-, but I don't know about those plans, and that is confusing, at least for me.

This idea came from you actually. It was you who suggest to store our codfw DBs in m5-master until we could move forward T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS

But m5 master isn't writable on codfw. It is just a replica of eqiad (which is the writable one). We are not making m5 codfw writable, that will be a different set of databases which won't be called m5 and which work is tracked at T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS

Clearly we are out of sync here, so I think we need to sync before doing anything. Once we agree on what has to be done, we can take over it or let you do it (no problem on that), but there is indeed some confusion about the deliverables.

But m5 master isn't writable on codfw. It is just a replica of eqiad (which is the writable one). We are not making m5 codfw writable, that will be a different set of databases which won't be called m5 and which work is tracked at T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS

It would be a cross-DC database connection, i.e, cloudcontrol2003-dev.codfw.wmnet talking to m5-master.eqiad.wmnet on tcp/3306.

That sounded weird to me at first, but since @jcrespo suggested it, thought it would be possible.

It would be a cross-DC database connection

No, that is not ok. Again, I take all blame if necessary, but that is certainly not what I meant. Sorry.

No problem, let schedule a meeting (IRC or hangaout) or something for us to sync?

For the record, a summary is:

  • we have several r/w DBs in codfw, in servers we need to decomm now (because some of them are Trusty). DBs are: codfw testing openstack, codfw testing wikitech, codfw testing toolsadmin, etc
  • we need to rescue those DBs to some external server before the decomm process
  • we evaluated the convenience of using a similar approach to what we use in eqiad: having all the DBs in an external DB server (m5-master in the case of eqiad). This evaluation was positive.
  • therefore, we want to use this m5-master-like server in codfw to store all the r/w DBs we have in codfw.
  • it was suggested that this would require a proper cluster, which requires HW procurement etc. So we opened T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS
  • since T218570 will take time, and we need a more short term approach for this (because some of the host we need to decomm are Trusty) this very ticket was created

For the record, a summary is:

  • we have several r/w DBs in codfw, in servers we need to decomm now (because some of them are Trusty). DBs are: codfw testing openstack, codfw testing wikitech, codfw testing toolsadmin, etc
  • we need to rescue those DBs to some external server before the decomm process

But m5 has databases with the same name, so that will not work (as you pointed out earlier). And you cannot load them just on the codfw replicas, as those are connected to eqiad master, and hence there will be no consistency anymore.
This might help to visualize things: https://dbtree.wikimedia.org/ go to the bottom of the page and look for m5.

  • therefore, we want to use this m5-master-like server in codfw to store all the r/w DBs we have in codfw.

Let's call it m6 for now, to avoid confusion:
Does m6 need to have en equivalent in eqiad?
Right now m5-master is active in eqiad, and it has replication to codfw (which is RO) so in case eqiad completely dies, we have the equivalent in codfw.
What would happen with those new DBs you want to write to in codfw? What if they fully crash?

If it needs to be writable directly in codfw, yes, you need new HW.

  • since T218570 will take time, and we need a more short term approach for this (because some of the host we need to decomm are Trusty) this very ticket was created

The problem is that the current m5-master in eqiad is the writable one, and the hosts in codfw are just replicas and they run read only, they just replicate the data from eqiad.

Let's call it m6 for now, to avoid confusion:

OK! m6 then.

Does m6 need to have en equivalent in eqiad?

No, I don't think so. At least in the starting point. Redundancy can be added in a later iteration.

Right now m5-master is active in eqiad, and it has replication to codfw (which is RO) so in case eqiad completely dies, we have the equivalent in codfw.
What would happen with those new DBs you want to write to in codfw? What if they fully crash?

We have the exact same situation right now. We can tolerate that, given this is a testing/stagging environment.

If it needs to be writable directly in codfw, yes, you need new HW.

Ok, agreed.

  • since T218570 will take time, and we need a more short term approach for this (because some of the host we need to decomm are Trusty) this very ticket was created

The problem is that the current m5-master in eqiad is the writable one, and the hosts in codfw are just replicas and they run read only, they just replicate the data from eqiad.

Ok, do you have any proposal or alternative?

Let's call it m6 for now, to avoid confusion:

OK! m6 then.

Does m6 need to have en equivalent in eqiad?

No, I don't think so. At least in the starting point. Redundancy can be added in a later iteration.

That would require HW too.

Right now m5-master is active in eqiad, and it has replication to codfw (which is RO) so in case eqiad completely dies, we have the equivalent in codfw.
What would happen with those new DBs you want to write to in codfw? What if they fully crash?

We have the exact same situation right now. We can tolerate that, given this is a testing/stagging environment.

Sure, that is your team's call :-)

If it needs to be writable directly in codfw, yes, you need new HW.

Ok, agreed.

  • since T218570 will take time, and we need a more short term approach for this (because some of the host we need to decomm are Trusty) this very ticket was created

The problem is that the current m5-master in eqiad is the writable one, and the hosts in codfw are just replicas and they run read only, they just replicate the data from eqiad.

Ok, do you have any proposal or alternative?

No, not really, I was trying to understand the situation.
We currently don't have spare hardware there so we cannot give you a temporary space to hold those databases in a writable mode until the new cluster is created.

Closing this task as declined, since the main idea was not very good:

  • having openstack servers in codfw
  • consuming databases that lives in m5-master eqiad
  • the whole cross-datacenter connection is idea was invalid.

This was intended as a workaround while T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS was being worked on.

For the record, we will be rallocating openstack databases into cloudcontrol2001-dev.wikimedia.org, which just got bootstrapped in T219626: codfw1dev: bootstrap cloudcontrol servers in mitaka/stretch