During the migration of testreduce1001 towards a new Bookworm replacement it became apparent that the testreduce database on m5 is unused and no longer needed, it can be deleted. The testreduce instance(s) carry relatively shotlived data and are running locally on the servers.
@Ladsgroup as mentioned on IRC, they mostly will only need database help for grant updates :-D
Pedantic me just chiming in also to mention that these dbs don't have backups (as designed- and I think widely understood by everyone on this ticket), but offering any kind of special, one-time backups from
Data-Persistence-Backup team -if and when necessary- as we do on every service migration (probably won't be needed)! :-D Do not be afraid to ask!
Edit: Sorry, wrong team, the right one for the task would be database-backups :-DD (nothing to do with the former)
Regarding the incompatible binary files between the two mariadb versions, is mysqldump the best way to migrate over the database from testreduce1001 to testreduce1002? Anything you can do there would be useful.
As for grants, whatever we have enabled on testreduce1001 should be enabled on testreduce1002 as well. I don't know where to go looking for them.
BTW, is there a reason testreduce uses a local mariadb installation instead of using on of the central DB misc clusters, such as e.g. m5? As Jaime correctly pointed out the data is currently not backed up and if the DB content is important enough to be carried over from testreduce1001 to testreduce1002 maybe that should be changed?
Testreduce is on m5: https://wikitech.wikimedia.org/wiki/MariaDB/misc#m5 but it is explicitly removed from backups, as requested. I agree with Moritz that that should mean that any migration should allow to lose all data, as it would happen normally without backups and be reloaded if needed again by service owners, not SRE.
I don't know where the signals crossed, but once testreduce1001 was spun up, we have been using a local db for running these rt tests and not the misc cluster dbs. You can probably confirm that there has been no activity on those m5 dbs for a while. Those two dbs (testreduce and testreduce_vd) can be dropped.
I will note that these dbs grow over time with lot of useless test result data and having it local lets us just run a script to drop all the old data every 2 weeks and reoptimize it (which is how we have been doing it since we only had 50 gb disk space). There is also the fact that we run ' npm install' locally on testreduce1002 and if that has a bearing on whether you want this server to have access to the misc db cluster.
But, all that said, I don't have a preference for whether we use the local db or the misc m5 db. So, I'll you all make that call. I defer to you all to make that decision as long as we have grants to manage it ourselves without needing to bug you all each time. :)
I don't know what was the original reasoning for choosing m5 were (and I might be missing something here). If this is more of a test system, I think we can get away with local db, making sure it's setup via puppet and all that jazz is the most complicated part but for any production db services, they must be in misc cluster and not local.
Stupid question: Why not using WMCS instead of production? There is even have Trove to serve dbs on demand.
Because the APIs in question (used to access undeployed Parsoid code) were not public and could not be accessed by cloud VMs. I should check if we have since made these APIs public and hence could be accessed from WMCS. If anything, we have been talking of making any unnecessary public APIs private. So, even if in the recent months (as part of RESTBase deprecation efforts), these APIs have become public, we may make them private again.
Separately, we want to only access the API endpoints on scandium, not the general mediawiki cluster. In our test runners on testreduce100x, we do that by setting proxy headers pointing at scandium. Not sure if that is going to work from outside the production cluster. In any case, we can investigate if anything has changed in recent months that might allow us to run these test clients on WMCS. But for now, let us complete this transition to testreduce1002 before we get into that rabbithole.
FWIW; I agree. Let's get https://gerrit.wikimedia.org/r/c/operations/puppet/+/957251/ reviewed/merged so that testreduce1002 can fully take over, all future changes can happen on that basis.