Page MenuHomePhabricator

Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup
Closed, ResolvedPublic8 Estimated Story Points

Description

On the dbstore1002 host the following databases exists along the wikis (with sizes)

SchemaNeededSizeCommentBackups needed
centralauthYes88GThis is part of s7, so it will be done once s7 is populated.N/A
datasetsNo72KNot needed T212487#4860691No
fab_migrationNo13MNot neededYes: cumin1001:/home/elukey/dbstore1002_backup
flowdbYes1,5GPart of x1, can be decided on laterN/A
oaiNo1,1GNot needed - just mysqldump it just in case T212487#4839932Yes: cumin1001:/home/elukey/dbstore1002_backup
opsNo2,1GNot neededNo
project_illustrationNo2.9GMost likely not needed, pending confirmation from @MarkTraceur T212487#4863909Yes: cumin1001:/home/elukey/dbstore1002_backup
stagingYes280Gdatabase used by analytics team for temporary tablesN/A
stewardNo120KNot needed (just one table and it is empty)No
testNo280KNot needed - just mysqldump it just in case T212487#4839932Yes: cumin1001:/home/elukey/dbstore1002_backup
warehouseNo72MNot needed - just mysqldump it just in case T212487#4839932Yes: cumin1001:/home/elukey/dbstore1002_backup

We should investigate which schemas are used and need do migrated, and if there are schemas which could be "left behind" (of course after archival?)

Event Timeline

elukey triaged this task as High priority.Dec 21 2018, 7:13 AM
elukey created this task.

Crossposting my comment from: T210478#4794533

ops doesn't need to be migrated.
I believe datasets isn't used anymore, but needs double checking.
centralauth is part of s7 so I assume it needs to be migrated
oai I think it is not used, but needs double checking
steward I think it is not used, but needs double checking
test looks like it was....a test. It has not been written since pretty much Jan 2017

elukey renamed this task from Review dbstore1002's tables and decide which ones needs to be migrated to the new multi instance setup to Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup.Dec 21 2018, 7:51 AM

datasets seems indeed not useful, I propose to just mysqldump it and save it in some safe space (Hadoop's HDFS seems a good fit)

fab_migration seems related to https://phabricator.wikimedia.org/rPHTO034e8bd616747a1fd52daf3a0038291c84429d4f, almost surely not used. Pinging @chasemp to verify :)

flowdb replication from X1 was requested by @EBernhardson in T75047, pinging him to see if we can get some more info :)

oai seems related to an old mediawiki extension, I'd simply back it up on HDFS.

project_illustration was requested in T124705, probably not used but triple checking with @leila if Research needs it (otherwise we can simply back it up on HDFS).

steward and warehouse are probably not used but I'd back them up on HDFS anyway.

test should not be neither backupped nor copied over in my opinion.

flowdb replication from X1 was requested by @EBernhardson in T75047, pinging him to see if we can get some more info :)

Currently we do replicate x1 to dbstore1002, which flowdb is part of. If for some reason we do not want to replicate flowdb anymore (but we want to keep replicating x1), we'd need a replication filter to ignore that particular DB.

The staging database is still needed and it is sufficient to copy it to a single instance.

The staging database is still needed and it is sufficient to copy it to a single instance.

For what is worth, I just checked and most of the tables are MyISAM/TokuDB. I _would love_ to have those migrated to InnoDB (for those tables that are still needed)

The staging database is still needed and it is sufficient to copy it to a single instance.

For what is worth, I just checked and most of the tables are MyISAM/TokuDB. I _would love_ to have those migrated to InnoDB (for those tables that are still needed)

+2 from my side, I don't see any reason to keep them with MyISAM/TokuDB!

project_illustration was requested in T124705, probably not used but triple checking with @leila if Research needs it (otherwise we can simply back it up on HDFS).

@leila ping :)

Ok so as far as I can see this task seems waiting a few confirmations but if the plan in T212487#4839932 looks good then we can proceed!

datasets seems indeed not useful, I propose to just mysqldump it and save it in some safe space (Hadoop's HDFS seems a good fit)

Yup, one of the tables there (content_namespaces) was mine—I just copied it to staging, which means I don't need the one in datasets any more.

project_illustration was requested in T124705, probably not used but triple checking with @leila if Research needs it (otherwise we can simply back it up on HDFS).

@leila ping :)

Sorry for the delay. Just got back from vacation and catching up. :)

I checked with bmansurov and our understanding is that Research doesn't need this. It seems it has been request by Mark Traceur per https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/266312/ and you may want to check with them.

Marostegui added a subscriber: MarkTraceur.

@elukey I have updated the original task, to add the last statuses of the current databases we hold there.
To sum up, we still need confirmation for:

fab_migration - @chasemp
flowdb - @EBernhardson
project_illustration - final word from @MarkTraceur

@elukey For those databases that we have decided, so far, to backup and archive: oai test warehouse I have done a mysqldump so you can grab and store wherever you like:
It is at:

cumin1001:/home/marostegui/dbstore1002_backup

I have also updated the original task to make it clearer

I asked @chasemp about fab_migration and I think we need to have a final word from @greg - this is what Chase said:

˜/chasemp 19:50> marostegui: so I think it's up to releng and here's why.  When phab was deployed in prod folks wanted to move history from a phab deploy in cloud and so that's what that DB is.  But I will say almost certainly it can be dropped unless releng actively has a plan for it, all the code I see referencing that DB is dead.
˜/chasemp 19:51> and the first 1300 or tasks I think are in phab are already created from that

@greg do you need that database or would you be ok with just a mysqldump stored somewhere in case you need to restore it?

Assigning this to @elukey so he can follow up what is pending at T212487#4867076

@elukey For those databases that we have decided, so far, to backup and archive: oai test warehouse I have done a mysqldump so you can grab and store wherever you like:
It is at:

cumin1001:/home/marostegui/dbstore1002_backup

I have also updated the original task to make it clearer

@elukey when you have a chance, can you move those to a different and permanent place so I can clean them up from cumin1001?

@Marostegui I have now everything in my home gzipped, I'll move it to stat1007 and HDFS asap:

elukey@cumin1001:~/dbstore1002_backup$ ls
oai.sql.gz  test.sql.gz  warehouse.sql.gz

cool, let me know when you think it is safe for me to delete my files

As spoken on IRC...I have deleted my files.

I asked @chasemp about fab_migration and I think we need to have a final word from @greg - this is what Chase said:

˜/chasemp 19:50> marostegui: so I think it's up to releng and here's why.  When phab was deployed in prod folks wanted to move history from a phab deploy in cloud and so that's what that DB is.  But I will say almost certainly it can be dropped unless releng actively has a plan for it, all the code I see referencing that DB is dead.
˜/chasemp 19:51> and the first 1300 or tasks I think are in phab are already created from that

@greg do you need that database or would you be ok with just a mysqldump stored somewhere in case you need to restore it?

If it's just the db dump of the labs phabricator that was imported into our production phab (which is what I understand it to be) then yeah, dump it for now.

@mmodell do you know if there's any reason to use that db in the future? ^

Thanks @greg - I have done the mysqldump for it.
@elukey I have left it at: cumin1001:/home/elukey/dbstore1002_backup

I am going to also consider flowdb as part of x1, so we can migrate it and later we can decide if we want to keep it or not (it is a matter of removing the filters and dropping it).

So the only pending one to close this task is to decide about project_illustration from @MarkTraceur

So the only pending one to close this task is to decide about project_illustration from @MarkTraceur

Let's back it up and not copy it over for the moment, if @MarkTraceur needs it we'll add it later on :)

@elukey once you've transferred all the files to the definite location, this task is ready to be resolved.
I have left a backup at cumin1001:/home/elukey/dbstore1002_backup
It looks like it is really not used:

-rw-rw---- 1 mysql mysql 204M Jan 16 09:43 dewiki_delta.ibd
-rw-rw---- 1 mysql mysql 5.1G Jan 16 09:49 enwiki_delta.ibd
-rw-rw---- 1 mysql mysql  96K Jan 16 09:49 eswiki_delta.ibd
-rw-rw---- 1 mysql mysql 640M Jan 16 09:51 frwiki_delta.ibd
-rw-rw---- 1 mysql mysql  96K Jan 16 09:51 jawiki_delta.ibd
Marostegui moved this task from In progress to Done on the DBA board.

Done!

elukey@stat1007:/srv/home/elukey$ sudo -u hdfs hdfs dfs -ls /wmf/data/archive/backup/misc/dbstore1002_backup
Found 6 items
-rw-r-----   3 hdfs hadoop        127 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/README
-rw-r-----   3 hdfs hadoop    2958889 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/fab_migration.sql
-rw-r-----   3 hdfs hadoop   56409884 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/oai.sql.gz
-rw-r-----   3 hdfs hadoop  648712821 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/project_illustration.sql.gz
-rw-r-----   3 hdfs hadoop    5365081 2019-01-21 14:24 /wmf/data/archive/backup/misc/dbstore1002_backup/test.sql.gz
-rw-r-----   3 hdfs hadoop 2888886122 2019-01-21 14:25 /wmf/data/archive/backup/misc/dbstore1002_backup/warehouse.sql.gz

Note: analytics will close the task after review in kanban/standup :)

elukey set the point value for this task to 8.Jan 21 2019, 2:28 PM
elukey moved this task from Next Up to Done on the Analytics-Kanban board.

FWIW (I know I'm a little late on this) I think that illustration project was something we either never got off the ground, or haven't looked at in some time.

@elukey Not sure if part of this task, or a separate one- but I see many projects being abandoned and forgottend after some time. I wonder if something should be setup so that, if there is no reads or writes after X amount of time, we are notified; or a dump + drop is done automatically.

I can help with the technical parts of that (we have dome similar reports in the past, and usage statistics are enabled as part of the normal monitoring/access control), but you would be the best person to "agree" with users on a solution that balances "not dropping important stuff" with "lots of TB wasted on a dead project that could be used by someone else". With more instances, this may be a more frequent problem. It doesn't necessarily have to be something automated, it could be "soft" quotas or just some usage policy (e.g. don't use more than X TB unless you ping us in advance). While 99% of the users are very responsible, sometimes accidents happen.

FWIW (I know I'm a little late on this) I think that illustration project was something we either never got off the ground, or haven't looked at in some time.

Thanks for coming back to us!
We took a backup of it anyways :-)

@elukey Not sure if part of this task, or a separate one- but I see many projects being abandoned and forgottend after some time. I wonder if something should be setup so that, if there is no reads or writes after X amount of time, we are notified; or a dump + drop is done automatically.

I can help with the technical parts of that (we have dome similar reports in the past, and usage statistics are enabled as part of the normal monitoring/access control), but you would be the best person to "agree" with users on a solution that balances "not dropping important stuff" with "lots of TB wasted on a dead project that could be used by someone else". With more instances, this may be a more frequent problem. It doesn't necessarily have to be something automated, it could be "soft" quotas or just some usage policy (e.g. don't use more than X TB unless you ping us in advance). While 99% of the users are very responsible, sometimes accidents happen.

Thanks for the suggestion! We basically backed up all the databases that were not staging (and not obviously garbage) to HDFS, and as far as I know there shouldn't be anything else on the new dbstore nodes (namely: people will be able only to create things on staging). I would be interested though in having something automated that would ping us when tables on that db become stale, in T212493 we pinged a lot of people and a ton of data was freed.

Mentioned in SAL (#wikimedia-operations) [2019-02-05T14:05:07Z] <marostegui> Delete non used grants from dbstore1002: log, warehouse,project_illustration, cognate\_wiktionary, datasets - T212487 T210478