Page MenuHomePhabricator

Should Striker's database be hosted on M5 or cloudcontrol/galera?
Closed, ResolvedPublic

Description

We're currently using M5 for openstack, striker, and wikitech.

The motivation for moving OpenStack services off of M5 are largely related to openstack having unusual database behavior that's ill-suited for the standard practices of wiki-hosting databases. That probably doesn't apply as much to Striker, which I expect acts more like a 'normal' web app. Can it stay on M5? Or is it on the verge of moving to k8s and using toolsdb?

I don't have any plans to move wikitech onto Galera, instead banking on T167973.

Event Timeline

My vote is to keep striker (and basically anything else) separate from the Openstack databases. The Openstack databases are subject to automated migrations and fun quirks about encoding as is. They tend to be destructive around number of database connections and once they are moved to the cloudcontrol cluster will be tied to cloudcontrol (Openstack) cluster operations. I don't know if we feel the same way about striker.

That said, it's not a big database, and I cannot remember us doing much in the way ops there. It just kind of runs. If we are clearing out M5 in the future, I wouldn't mind that moving to any other general-use or multiple use database server, personally. Is there such a thing, or is the plan to end M5-like servers entirely?

I too vote to keep Striker separate from the Openstack databases. Ideally we will move Striker's code off to a Kubernetes cluster at some point as well, but not into Toolforge as it uses LDAP auth which we cannot do there. More likely into the core network's Kubernetes when they are ready for non-MediaWiki services to run there.

Sounds good. I don't know if there's a long-term plan for miscellaneous DB hosting but for now I'll leave it be and we'll see if the DBAs chase us away later.

I would ideally like to get wikitech out of m5 and so m5 would only have openstack/striker databases so we could entirely hand it over to WMCS (or migrate it to somewhere else within your infra), unfortunately I don't see wikitech being moved short-term.

Allow me to offtopic here to make a relevant question. I reviewed backup policy on m5, and I got the following discrepancies:

The following databases are set to be backed up but don't exist:

  • labspuppet
  • ceilometer
  • designate_pool_manager

The following databases exist but are not set to be backed up:

  • test_labsdbaccounts

Could you clarify this and if we should change the backup policy to have a better coverage?

Thanks for checking, @jcrespo; answers in line:

Allow me to offtopic here to make a relevant question. I reviewed backup policy on m5, and I got the following discrepancies:

The following databases are set to be backed up but don't exist:

  • labspuppet

This is now hosted on a cloud instance, backup job can be removed

  • ceilometer

This never did anything important, backup job can be removed

  • designate_pool_manager

This was important during an earlier release of Designate but was phased out a couple of versions ago. So, again, backup job isn't needed.

The following databases exist but are not set to be backed up:

  • test_labsdbaccounts

I don't know what this is! It's probably transitory but I will ask during today's meeting.

Could you clarify this and if we should change the backup policy to have a better coverage?

Bstorm moved this task from Needs discussion to Doing on the cloud-services-team (Kanban) board.

Assigning to myself to go look into test_labsdbaccounts

I would ideally like to get wikitech out of m5 and so m5 would only have openstack/striker databases so we could entirely hand it over to WMCS (or migrate it to somewhere else within your infra), unfortunately I don't see wikitech being moved short-term.

Moving wikitech is tracked:

I really think the blockers for this are gone on the Wikitech side. We just need to do the work to move wikitech into the main hosting cluster and finally stop treating it like a snowflake.

Moving Striker is not something I remember being asked to think about before. Understanding the intent of the request would be helpful as the Toolhub project is finally planned to start moving forward in FY20/21. That project will be building new service with several architectural similarities to Striker including a need for a MySQL/MariaDB primary data store. If the intent is just to get rid of one of the misc shards, let's move Striker to one of m1-3. If the intent is to deprecate support for non-MediaWiki related schemas then I guess I will need to figure out storage support for several production applications.

I would ideally like to get wikitech out of m5 and so m5 would only have openstack/striker databases so we could entirely hand it over to WMCS (or migrate it to somewhere else within your infra), unfortunately I don't see wikitech being moved short-term.

Moving wikitech is tracked:

I really think the blockers for this are gone on the Wikitech side. We just need to do the work to move wikitech into the main hosting cluster and finally stop treating it like a snowflake.

Those are good news! Maybe we can try to fit this in for the Q1 switchover, as it requires a bit of work on replication filters and mullti source for a few weeks. I will discuss it with the team.

Moving Striker is not something I remember being asked to think about before. Understanding the intent of the request would be helpful as the Toolhub project is finally planned to start moving forward in FY20/21. That project will be building new service with several architectural similarities to Striker including a need for a MySQL/MariaDB primary data store. If the intent is just to get rid of one of the misc shards, let's move Striker to one of m1-3. If the intent is to deprecate support for non-MediaWiki related schemas then I guess I will need to figure out storage support for several production applications.

No, I wasn't asking to move Striker. Right now m5 only holds striker and openstack databases + wikitech + some testreduce databases (which I guess could be moved to another misc eventually).
If we finally move wikitech out, this misc m5 section would almost entirely belong to WMCS, so I thought that maybe it could be just owned by you guys - this is probably not the right venue to have this conversation and probably not the right time either, as wikitech is still there :-)

I wouldn't move those databases to another misc section for several reasons:

  • we do enforce using the proxies there
  • m5 currently doesn't use a proxy (even though they are available) cause from time to time we do see issues with these databases having very big peaks and causing unavailability (and the proxy failing over to the RO replica).
  • Sometimes those spikes have caused the unavailability of the whole master (just for a few seconds) which is now narrowed to a set of services + wikitech, so I wouldn't want that happening on other misc sections it would be shared with.

Let's wait to see if we can move wikitech out in the next DC window and we can resume this discussion from that point!

Thank you

Just checked on test_labsdbaccounts and can confirm that's full of really old data. I think it was from first trial runs of the live maintain-dbusers (db mentions labsdb1001 and the schema matches maintain-dbusers). It's probably not bad to have that in case we ever attempted to mess with maintain-dbusers in a big way, but it not being backed up is good. It's junk.

Because so many people helped us with this, please let me summarize what I plan to do, if I understood well (correct me otherwise):

Remove the following from backups:

  • labspuppet
  • ceilometer
  • designate_pool_manager

Keep the db, but document that not backing it up is on purpose (needs no backups):

  • test_labsdbaccounts

I have applied the above changes, this is the list of things being backed up now on m5 FYI:

`neutron`.*                       
`labswiki`.*                         
`labsdbaccounts`.*                 
`striker`.*                        
`nodepooldb`.*                  
`designate`.*                 
`nova`.*                           
`keystone`.*                    
`glance`.*                
`nova_eqiad1`.*             
`nova_api_eqiad1`.*                 
`nova_cell0_eqiad1`.*