Page MenuHomePhabricator

Retire testreduce database on m5
Closed, ResolvedPublic

Description

During the migration of testreduce1001 towards a new Bookworm replacement it became apparent that the testreduce database on m5 is unused and no longer needed, it can be deleted. The testreduce instance(s) carry relatively shotlived data and are running locally on the servers.

Event Timeline

Do DBAs really own or maintain testreduce1001? I never knew it existed. I'd be happy to give a hand though.

@Ladsgroup as mentioned on IRC, they mostly will only need database help for grant updates :-D

Pedantic me just chiming in also to mention that these dbs don't have backups (as designed- and I think widely understood by everyone on this ticket), but offering any kind of special, one-time backups from Data-Persistence-Backup team -if and when necessary- as we do on every service migration (probably won't be needed)! :-D Do not be afraid to ask!

Edit: Sorry, wrong team, the right one for the task would be database-backups :-DD (nothing to do with the former)

I can take care of the grants. Let me know the details and consider it done.

Regarding the incompatible binary files between the two mariadb versions, is mysqldump the best way to migrate over the database from testreduce1001 to testreduce1002? Anything you can do there would be useful.

As for grants, whatever we have enabled on testreduce1001 should be enabled on testreduce1002 as well. I don't know where to go looking for them.

BTW, is there a reason testreduce uses a local mariadb installation instead of using on of the central DB misc clusters, such as e.g. m5? As Jaime correctly pointed out the data is currently not backed up and if the DB content is important enough to be carried over from testreduce1001 to testreduce1002 maybe that should be changed?

Testreduce is on m5: https://wikitech.wikimedia.org/wiki/MariaDB/misc#m5 but it is explicitly removed from backups, as requested. I agree with Moritz that that should mean that any migration should allow to lose all data, as it would happen normally without backups and be reloaded if needed again by service owners, not SRE.

if it's in m5, you don't really need much changes or move data around or anything like that, you'll be upgrading mysql clinet, not the server. Am I missing something?

Marostegui triaged this task as Medium priority.Sep 12 2023, 4:22 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Change 957301 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Drop the rsync migration setup for testreduce

https://gerrit.wikimedia.org/r/957301

Change 957301 merged by Muehlenhoff:

[operations/puppet@production] Drop the rsync migration setup for testreduce

https://gerrit.wikimedia.org/r/957301

I don't know where the signals crossed, but once testreduce1001 was spun up, we have been using a local db for running these rt tests and not the misc cluster dbs. You can probably confirm that there has been no activity on those m5 dbs for a while. Those two dbs (testreduce and testreduce_vd) can be dropped.

I will note that these dbs grow over time with lot of useless test result data and having it local lets us just run a script to drop all the old data every 2 weeks and reoptimize it (which is how we have been doing it since we only had 50 gb disk space). There is also the fact that we run ' npm install' locally on testreduce1002 and if that has a bearing on whether you want this server to have access to the misc db cluster.

But, all that said, I don't have a preference for whether we use the local db or the misc m5 db. So, I'll you all make that call. I defer to you all to make that decision as long as we have grants to manage it ourselves without needing to bug you all each time. :)

I don't know what was the original reasoning for choosing m5 were (and I might be missing something here). If this is more of a test system, I think we can get away with local db, making sure it's setup via puppet and all that jazz is the most complicated part but for any production db services, they must be in misc cluster and not local.

Stupid question: Why not using WMCS instead of production? There is even have Trove to serve dbs on demand.

Because the APIs in question (used to access undeployed Parsoid code) were not public and could not be accessed by cloud VMs. I should check if we have since made these APIs public and hence could be accessed from WMCS. If anything, we have been talking of making any unnecessary public APIs private. So, even if in the recent months (as part of RESTBase deprecation efforts), these APIs have become public, we may make them private again.

Separately, we want to only access the API endpoints on scandium, not the general mediawiki cluster. In our test runners on testreduce100x, we do that by setting proxy headers pointing at scandium. Not sure if that is going to work from outside the production cluster. In any case, we can investigate if anything has changed in recent months that might allow us to run these test clients on WMCS. But for now, let us complete this transition to testreduce1002 before we get into that rabbithole.

MoritzMuehlenhoff renamed this task from Migrate testreduce database from testreduce1001 to testreduce1002 to Migrate DB grants for testreduce1001 to testreduce1002.Sep 15 2023, 6:50 AM

But for now, let us complete this transition to testreduce1002 before we get into that rabbithole.

FWIW; I agree. Let's get https://gerrit.wikimedia.org/r/c/operations/puppet/+/957251/ reviewed/merged so that testreduce1002 can fully take over, all future changes can happen on that basis.

Until the switchover is done (and a week after it), we can't do any maint work on dbs since they are now having circular replication. But maybe since this is misc cluster only and that's not getting any switchover to my knowledge, it's fine? Let me ask @Marostegui

Yeah, misc clusters aren't part of the switchover.

Awesome. If the patch gets reviewed, I'll deploy it on Monday.

Discussion between @ssastry and @Ladsgroup and myself has shown that the testreduce database on m5 is unused and no longer needed, it can be deleted. The testreduce instance(s) carry relatively shotlived data and are running locally on the servers. I'll retitle the task.

MoritzMuehlenhoff renamed this task from Migrate DB grants for testreduce1001 to testreduce1002 to Retire testreduce database on m5.Oct 5 2023, 10:42 AM
MoritzMuehlenhoff updated the task description. (Show Details)

Is this good to go? Do we need a final backup?

Is this good to go?

Yes

Do we need a final backup?

Nope, this is a testing setup.

Change 966327 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production-m5.sql.erb: Remove testreduce grants

https://gerrit.wikimedia.org/r/966327

Change 966327 merged by Marostegui:

[operations/puppet@production] production-m5.sql.erb: Remove testreduce grants

https://gerrit.wikimedia.org/r/966327