Page MenuHomePhabricator

[toolsdb] Migrate mixnmatch db to Trove
Open, MediumPublic

Description

tool-db-usage shows the database s51434__mixnmatch_p is currently one of the biggest in ToolsDB at 191GB.

It's also doing some very slow queries (T349695) and caused some replication issues in the past (T341891).

We should migrate it to a dedicated Trove instance.

Update July 2025: this is still in the plans, but not currently a priority.

Event Timeline

I am open to this, as I already have a Trove instance for baglama2, and it works well (after some initial problems).

The difference here is that for baglama2, I (that is, a script I run, essentially) control when things are written to the database, so I could just not run that script while the SQL dump was transferred.

Mix'n'match is driven by user interaction, so it would have to be offline for the entire time of the transfer. I don't see an easy way to transfer a dump and then update it either, unless you can set a Trove instance as a DB slave (or whatever word we use these days), and then just pause Mix'n'match for a minute or so to make the Trove instance master.

I would appreciate ideas for this.

Migrating the data will definitely take some time, I explored a few options in T328691: [toolsdb] Migrate linkwatcher db to Trove and eventually used mydumper to generate SQL backups and import them into Trove.

In the case of Linkwatcher, I gave up on the idea of replicating from the old database, but the linkwatcher database had additional complications (and was almost 10 times the size).

My strategy here would be:

  1. create a SQL backup with mysqldump or mydumper
  2. import the backup into Trove
  3. configure Trove to replicate from ToolsDB, replicating only that single db
  4. when replication catches up and the two DBs are in sync, change the tool config to point to Trove

It should be theoretically possible, but I have never done it before. :) I'm happy to take care of steps 1, 2 and 3, but I'm not sure if I'm gonna work on this immediately. If we find other ways to stop the ToolsDB OOM errors (T349695), this task becomes less urgent.

fnegri triaged this task as Medium priority.Dec 13 2023, 12:25 PM

Hi @fnegri if this will take a while, could you throw me a few more DB connections for mix-n-match on ToolsDB in the meantime? I have a lot of background tasks running to keep everything useful and in sync with Wikidata.

@Magnus the migration to Trove is still in my to-do list but I'm not planning to start it before next month at the earliest.

Re: connections, the current value for @@GLOBAL.max_user_connections is 20 which is already a lot for most use cases... Unfortunately I think that value is shared by all users so I cannot customize it just for one tool.

One thing you could explore is using the new read-only endpoint for reading data, you can make 20 additional connections to tools-readonly.db.svc.wikimedia.cloud which is pointing to the replica server. Bear in mind that there can be some replication lag.