- - setup a mysql 8.0 db, configure with envel
- - on a seperate or new server on wmf-cloud
- - add disk space for mysql https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances
- - flask on gunicorn
- - publicly accessible http
- - see if dumps can be made accesible, (Cloud VPS users can request to have the share available, see Wikitech Help:Shared storage#/public/dumps)
- - python with pip in /srv land (ended up with virtualenv)
- - swap space on staging
- - swap space on production
- - backend on production (gunicorn, etc.)
Description
Related Objects
Event Timeline
@Envlh do you use a dockerize image of mysql 8, or install it from package, or use a general cli-installer?
On Denelezh, I installed MySQL using Oracle's apt repository: https://dev.mysql.com/downloads/repo/apt/
Here is an excerpt of my notes:
wget https://dev.mysql.com/get/mysql-apt-config_0.8.15-1_all.deb dpkg -i mysql-apt-config_0.8.15-1_all.deb apt update apt upgrade apt install mysql-server
It may conflict with MariaDB if it is already installed.
This is just one way to install MySQL, it is probably perfectly fine to use a Docker image.
Either way, you will have to customize MySQL configuration. The most important is innodb_buffer_pool_size variable for when you will work with a complete dump. To sum up, it's the RAM that MySQL will use to work with index and data in memory. Its usual value is 70-80% of the available RAM on a MySQL dedicated server. It's 10G on Denelezh. You can find MySQL configuration on Denelezh in the file /etc/mysql/my.cnf.
The copy of Denelezh's MySQL data files is 365G. The denelezh schema is 324G. I'm working on setting-up a local MySQL server to restore it (and carefully, as this is the only backup at the moment). One idea is to export only a subset of it, so it will be easier to transfer it to Humaniki's server and to restore it.
Here are the details for each file on denelezh schema:
12K country.frm 96K country.ibd 1.0K db.opt 12K dump.frm 96K dump.ibd 12K human.frm 640M human.ibd 12K human_country.frm 272M human_country.ibd 12K human_occupation.frm 2.7G human_occupation.ibd 12K human_sitelink.frm 872M human_sitelink.ibd 96K kpi#P#dump20010101.ibd 1.9G kpi#P#dump20170102.ibd 1.9G kpi#P#dump20170206.ibd 1.9G kpi#P#dump20170306.ibd 2.0G kpi#P#dump20170403.ibd 2.0G kpi#P#dump20170501.ibd 2.1G kpi#P#dump20170605.ibd 2.2G kpi#P#dump20170703.ibd 2.3G kpi#P#dump20170807.ibd 2.3G kpi#P#dump20170904.ibd 2.4G kpi#P#dump20171002.ibd 2.4G kpi#P#dump20171106.ibd 2.2G kpi#P#dump20171204.ibd 2.4G kpi#P#dump20180101.ibd 2.4G kpi#P#dump20180205.ibd 2.4G kpi#P#dump20180305.ibd 2.5G kpi#P#dump20180402.ibd 2.6G kpi#P#dump20180604.ibd 2.7G kpi#P#dump20180806.ibd 2.8G kpi#P#dump20180903.ibd 2.8G kpi#P#dump20181001.ibd 3.0G kpi#P#dump20181105.ibd 3.0G kpi#P#dump20181203.ibd 3.0G kpi#P#dump20190107.ibd 3.0G kpi#P#dump20190204.ibd 3.1G kpi#P#dump20190304.ibd 3.2G kpi#P#dump20190401.ibd 3.2G kpi#P#dump20190408.ibd 3.3G kpi#P#dump20190415.ibd 3.3G kpi#P#dump20190422.ibd 3.2G kpi#P#dump20190429.ibd 3.3G kpi#P#dump20190506.ibd 3.3G kpi#P#dump20190513.ibd 3.3G kpi#P#dump20190520.ibd 3.3G kpi#P#dump20190527.ibd 3.3G kpi#P#dump20190603.ibd 3.3G kpi#P#dump20190610.ibd 3.4G kpi#P#dump20190617.ibd 3.5G kpi#P#dump20190704.ibd 3.8G kpi#P#dump20190708.ibd 3.5G kpi#P#dump20190717.ibd 3.5G kpi#P#dump20190722.ibd 3.5G kpi#P#dump20190729.ibd 3.5G kpi#P#dump20190805.ibd 3.5G kpi#P#dump20190812.ibd 3.6G kpi#P#dump20190819.ibd 3.6G kpi#P#dump20190826.ibd 3.6G kpi#P#dump20190902.ibd 3.5G kpi#P#dump20190909.ibd 3.6G kpi#P#dump20190916.ibd 3.6G kpi#P#dump20190923.ibd 3.6G kpi#P#dump20190930.ibd 3.6G kpi#P#dump20191111.ibd 3.7G kpi#P#dump20191118.ibd 3.7G kpi#P#dump20191125.ibd 3.7G kpi#P#dump20191202.ibd 3.7G kpi#P#dump20191209.ibd 3.7G kpi#P#dump20191216.ibd 3.7G kpi#P#dump20200106.ibd 3.7G kpi#P#dump20200113.ibd 2.2G kpi#P#dump20200127.ibd 3.8G kpi#P#dump20200203.ibd 3.9G kpi#P#dump20200210.ibd 3.9G kpi#P#dump20200217.ibd 3.9G kpi#P#dump20200224.ibd 3.9G kpi#P#dump20200302.ibd 3.9G kpi#P#dump20200309.ibd 4.0G kpi#P#dump20200330.ibd 4.0G kpi#P#dump20200406.ibd 4.0G kpi#P#dump20200413.ibd 4.0G kpi#P#dump20200420.ibd 4.0G kpi#P#dump20200427.ibd 4.0G kpi#P#dump20200504.ibd 4.2G kpi#P#dump20200511.ibd 4.1G kpi#P#dump20200518.ibd 4.1G kpi#P#dump20200525.ibd 4.2G kpi#P#dump20200601.ibd 4.2G kpi#P#dump20200608.ibd 4.2G kpi#P#dump20200615.ibd 4.2G kpi#P#dump20200622.ibd 4.2G kpi#P#dump20200629.ibd 4.3G kpi#P#dump20200706.ibd 4.3G kpi#P#dump20200713.ibd 4.3G kpi#P#dump20200720.ibd 4.3G kpi#P#dump20200727.ibd 4.4G kpi#P#dump20200803.ibd 4.4G kpi#P#dump20200810.ibd 4.4G kpi#P#dump20200817.ibd 4.4G kpi#P#dump20200824.ibd 4.4G kpi#P#dump20200831.ibd 4.4G kpi#P#dump20200914.ibd 4.5G kpi#P#dump20200921.ibd 20K kpi.frm 12K label.frm 13G label.ibd 12K occupation.frm 128K occupation.ibd 12K occupation_parent.frm 11M occupation_parent.ibd 12K project.frm 176K project.ibd 12K year.frm 224K year.ibd
For reference, the WHGI snapshot data.
maximilianklein@whgi:~/snapshot_data$ du -sh . 75G .
maximilianklein@whgi:~/snapshot_data$ du -sh * 148M 2014-09-17 150M 2014-10-13 167M 2015-07-28 171M 2015-08-03 197M 2015-08-09 197M 2015-08-12 197M 2015-08-16 198M 2015-08-21 197M 2015-08-23 198M 2015-09-06 ... 541M 2020-07-28 544M 2020-08-04 549M 2020-08-18 549M 2020-08-25 553M 2020-09-08 553M 2020-09-14 561M 2020-09-22 565M 2020-09-29 574M 2020-10-20 575M 2020-10-27
@Envlh I have made some calculations about how much space we may need in the future here. If you could please check the assumptions, that'd be good.
https://docs.google.com/spreadsheets/d/1jSNl1Ju_94TBBXCi6TojypdjXPdRuwLfBLCITTVS3P8/edit?usp=sharing
The final numbers are:
scenario | future years | raw table history kept | TB |
Maximum | 5 years | full (5 years) | 3.2 |
Minimum | 3 years | 1 year window | 1.45 |