In T210478#4794524, @Banyek wrote:On the dbstore1002 host the following databases exists along the wikis (with sizes)
Schema Needed Size Comment Backups needed centralauth Yes 88G This is part of s7, so it will be done once s7 is populated. N/A datasets No 72K Not needed T212487#4860691 No fab_migration No 13M Not needed Yes: cumin1001:/home/elukey/dbstore1002_backup flowdb Yes 1,5G Part of x1, can be decided on later N/A oai No 1,1G Not needed - just mysqldump it just in case T212487#4839932 Yes: cumin1001:/home/elukey/dbstore1002_backup ops No 2,1G Not needed No project_illustration No 2.9G Most likely not needed, pending confirmation from @MarkTraceur T212487#4863909 Yes: cumin1001:/home/elukey/dbstore1002_backup staging Yes 280G database used by analytics team for temporary tables N/A steward No 120K Not needed (just one table and it is empty) No test No 280K Not needed - just mysqldump it just in case T212487#4839932 Yes: cumin1001:/home/elukey/dbstore1002_backup warehouse No 72M Not needed - just mysqldump it just in case T212487#4839932 Yes: cumin1001:/home/elukey/dbstore1002_backup We should investigate which schemas are used and need do migrated, and if there are schemas which could be "left behind" (of course after archival?)
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | elukey | T172410 Replace the current multisource analytics-store setup | |||
Resolved | Jclark-ctr | T216491 Decommission dbstore1002 | |||
Resolved | Marostegui | T210478 Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] | |||
Resolved | elukey | T212487 Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup |
Event Timeline
Comment Actions
Crossposting my comment from: T210478#4794533
ops doesn't need to be migrated.
I believe datasets isn't used anymore, but needs double checking.
centralauth is part of s7 so I assume it needs to be migrated
oai I think it is not used, but needs double checking
steward I think it is not used, but needs double checking
test looks like it was....a test. It has not been written since pretty much Jan 2017
Comment Actions
datasets seems indeed not useful, I propose to just mysqldump it and save it in some safe space (Hadoop's HDFS seems a good fit)
fab_migration seems related to https://phabricator.wikimedia.org/rPHTO034e8bd616747a1fd52daf3a0038291c84429d4f, almost surely not used. Pinging @chasemp to verify :)
flowdb replication from X1 was requested by @EBernhardson in T75047, pinging him to see if we can get some more info :)
oai seems related to an old mediawiki extension, I'd simply back it up on HDFS.
project_illustration was requested in T124705, probably not used but triple checking with @leila if Research needs it (otherwise we can simply back it up on HDFS).
steward and warehouse are probably not used but I'd back them up on HDFS anyway.
test should not be neither backupped nor copied over in my opinion.
Comment Actions
Currently we do replicate x1 to dbstore1002, which flowdb is part of. If for some reason we do not want to replicate flowdb anymore (but we want to keep replicating x1), we'd need a replication filter to ignore that particular DB.
Comment Actions
The staging database is still needed and it is sufficient to copy it to a single instance.
Comment Actions
For what is worth, I just checked and most of the tables are MyISAM/TokuDB. I _would love_ to have those migrated to InnoDB (for those tables that are still needed)
Comment Actions
Ok so as far as I can see this task seems waiting a few confirmations but if the plan in T212487#4839932 looks good then we can proceed!
Comment Actions
Yup, one of the tables there (content_namespaces) was mine—I just copied it to staging, which means I don't need the one in datasets any more.
Comment Actions
Sorry for the delay. Just got back from vacation and catching up. :)
I checked with bmansurov and our understanding is that Research doesn't need this. It seems it has been request by Mark Traceur per https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/266312/ and you may want to check with them.
Comment Actions
@elukey I have updated the original task, to add the last statuses of the current databases we hold there.
To sum up, we still need confirmation for:
fab_migration - @chasemp
flowdb - @EBernhardson
project_illustration - final word from @MarkTraceur
Comment Actions
@elukey For those databases that we have decided, so far, to backup and archive: oai test warehouse I have done a mysqldump so you can grab and store wherever you like:
It is at:
cumin1001:/home/marostegui/dbstore1002_backup
I have also updated the original task to make it clearer
Comment Actions
I asked @chasemp about fab_migration and I think we need to have a final word from @greg - this is what Chase said:
˜/chasemp 19:50> marostegui: so I think it's up to releng and here's why. When phab was deployed in prod folks wanted to move history from a phab deploy in cloud and so that's what that DB is. But I will say almost certainly it can be dropped unless releng actively has a plan for it, all the code I see referencing that DB is dead. ˜/chasemp 19:51> and the first 1300 or tasks I think are in phab are already created from that
@greg do you need that database or would you be ok with just a mysqldump stored somewhere in case you need to restore it?
Comment Actions
@elukey when you have a chance, can you move those to a different and permanent place so I can clean them up from cumin1001?
Comment Actions
@Marostegui I have now everything in my home gzipped, I'll move it to stat1007 and HDFS asap:
elukey@cumin1001:~/dbstore1002_backup$ ls oai.sql.gz test.sql.gz warehouse.sql.gz
Comment Actions
If it's just the db dump of the labs phabricator that was imported into our production phab (which is what I understand it to be) then yeah, dump it for now.
@mmodell do you know if there's any reason to use that db in the future? ^
Comment Actions
Thanks @greg - I have done the mysqldump for it.
@elukey I have left it at: cumin1001:/home/elukey/dbstore1002_backup
I am going to also consider flowdb as part of x1, so we can migrate it and later we can decide if we want to keep it or not (it is a matter of removing the filters and dropping it).
So the only pending one to close this task is to decide about project_illustration from @MarkTraceur
Comment Actions
Let's back it up and not copy it over for the moment, if @MarkTraceur needs it we'll add it later on :)
Comment Actions
@elukey once you've transferred all the files to the definite location, this task is ready to be resolved.
I have left a backup at cumin1001:/home/elukey/dbstore1002_backup
It looks like it is really not used:
-rw-rw---- 1 mysql mysql 204M Jan 16 09:43 dewiki_delta.ibd -rw-rw---- 1 mysql mysql 5.1G Jan 16 09:49 enwiki_delta.ibd -rw-rw---- 1 mysql mysql 96K Jan 16 09:49 eswiki_delta.ibd -rw-rw---- 1 mysql mysql 640M Jan 16 09:51 frwiki_delta.ibd -rw-rw---- 1 mysql mysql 96K Jan 16 09:51 jawiki_delta.ibd
Comment Actions
Done!
elukey@stat1007:/srv/home/elukey$ sudo -u hdfs hdfs dfs -ls /wmf/data/archive/backup/misc/dbstore1002_backup Found 6 items -rw-r----- 3 hdfs hadoop 127 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/README -rw-r----- 3 hdfs hadoop 2958889 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/fab_migration.sql -rw-r----- 3 hdfs hadoop 56409884 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/oai.sql.gz -rw-r----- 3 hdfs hadoop 648712821 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/project_illustration.sql.gz -rw-r----- 3 hdfs hadoop 5365081 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/test.sql.gz -rw-r----- 3 hdfs hadoop 2888886122 2019-01-21 14:25 /wmf/data/archive/backup/misc/dbstore1002_backup/warehouse.sql.gz
Note: analytics will close the task after review in kanban/standup :)
Comment Actions
FWIW (I know I'm a little late on this) I think that illustration project was something we either never got off the ground, or haven't looked at in some time.
Comment Actions
@elukey Not sure if part of this task, or a separate one- but I see many projects being abandoned and forgottend after some time. I wonder if something should be setup so that, if there is no reads or writes after X amount of time, we are notified; or a dump + drop is done automatically.
I can help with the technical parts of that (we have dome similar reports in the past, and usage statistics are enabled as part of the normal monitoring/access control), but you would be the best person to "agree" with users on a solution that balances "not dropping important stuff" with "lots of TB wasted on a dead project that could be used by someone else". With more instances, this may be a more frequent problem. It doesn't necessarily have to be something automated, it could be "soft" quotas or just some usage policy (e.g. don't use more than X TB unless you ping us in advance). While 99% of the users are very responsible, sometimes accidents happen.
Comment Actions
Thanks for the suggestion! We basically backed up all the databases that were not staging (and not obviously garbage) to HDFS, and as far as I know there shouldn't be anything else on the new dbstore nodes (namely: people will be able only to create things on staging). I would be interested though in having something automated that would ping us when tables on that db become stale, in T212493 we pinged a lot of people and a ton of data was freed.
Comment Actions
Mentioned in SAL (#wikimedia-operations) [2019-02-05T14:05:07Z] <marostegui> Delete non used grants from dbstore1002: log, warehouse,project_illustration, cognate\_wiktionary, datasets - T212487 T210478