Page MenuHomePhabricator

dev-ops - create server and mysql server
Closed, ResolvedPublic3 Estimated Story Points

Description

  • - setup a mysql 8.0 db, configure with envel
  • - on a seperate or new server on wmf-cloud
  • - add disk space for mysql https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances
  • - flask on gunicorn
  • - publicly accessible http
  • - see if dumps can be made accesible, (Cloud VPS users can request to have the share available, see Wikitech Help:Shared storage#/public/dumps)
  • - python with pip in /srv land (ended up with virtualenv)
  • - swap space on staging
  • - swap space on production
  • - backend on production (gunicorn, etc.)

Event Timeline

notconfusing renamed this task from dev-ops to dev-ops - create server and mysql server.Sep 23 2020, 10:31 PM
notconfusing created this task.
notconfusing set the point value for this task to 3.

@Envlh do you use a dockerize image of mysql 8, or install it from package, or use a general cli-installer?

On Denelezh, I installed MySQL using Oracle's apt repository: https://dev.mysql.com/downloads/repo/apt/

Here is an excerpt of my notes:

wget https://dev.mysql.com/get/mysql-apt-config_0.8.15-1_all.deb
dpkg -i mysql-apt-config_0.8.15-1_all.deb
apt update
apt upgrade
apt install mysql-server

It may conflict with MariaDB if it is already installed.

This is just one way to install MySQL, it is probably perfectly fine to use a Docker image.

Either way, you will have to customize MySQL configuration. The most important is innodb_buffer_pool_size variable for when you will work with a complete dump. To sum up, it's the RAM that MySQL will use to work with index and data in memory. Its usual value is 70-80% of the available RAM on a MySQL dedicated server. It's 10G on Denelezh. You can find MySQL configuration on Denelezh in the file /etc/mysql/my.cnf.

@Envlh , see T267499 . In order to get more disk space we need to propose a custom size. Do remember the GB sizes of the different tables in denelezh? We're going to need much more since we'll be storing all our dump history as well. Maybe a better way to estimate is from the CSV sizes per week?

The copy of Denelezh's MySQL data files is 365G. The denelezh schema is 324G. I'm working on setting-up a local MySQL server to restore it (and carefully, as this is the only backup at the moment). One idea is to export only a subset of it, so it will be easier to transfer it to Humaniki's server and to restore it.

Here are the details for each file on denelezh schema:

12K     country.frm
96K     country.ibd
1.0K    db.opt
12K     dump.frm
96K     dump.ibd
12K     human.frm
640M    human.ibd
12K     human_country.frm
272M    human_country.ibd
12K     human_occupation.frm
2.7G    human_occupation.ibd
12K     human_sitelink.frm
872M    human_sitelink.ibd
96K     kpi#P#dump20010101.ibd
1.9G    kpi#P#dump20170102.ibd
1.9G    kpi#P#dump20170206.ibd
1.9G    kpi#P#dump20170306.ibd
2.0G    kpi#P#dump20170403.ibd
2.0G    kpi#P#dump20170501.ibd
2.1G    kpi#P#dump20170605.ibd
2.2G    kpi#P#dump20170703.ibd
2.3G    kpi#P#dump20170807.ibd
2.3G    kpi#P#dump20170904.ibd
2.4G    kpi#P#dump20171002.ibd
2.4G    kpi#P#dump20171106.ibd
2.2G    kpi#P#dump20171204.ibd
2.4G    kpi#P#dump20180101.ibd
2.4G    kpi#P#dump20180205.ibd
2.4G    kpi#P#dump20180305.ibd
2.5G    kpi#P#dump20180402.ibd
2.6G    kpi#P#dump20180604.ibd
2.7G    kpi#P#dump20180806.ibd
2.8G    kpi#P#dump20180903.ibd
2.8G    kpi#P#dump20181001.ibd
3.0G    kpi#P#dump20181105.ibd
3.0G    kpi#P#dump20181203.ibd
3.0G    kpi#P#dump20190107.ibd
3.0G    kpi#P#dump20190204.ibd
3.1G    kpi#P#dump20190304.ibd
3.2G    kpi#P#dump20190401.ibd
3.2G    kpi#P#dump20190408.ibd
3.3G    kpi#P#dump20190415.ibd
3.3G    kpi#P#dump20190422.ibd
3.2G    kpi#P#dump20190429.ibd
3.3G    kpi#P#dump20190506.ibd
3.3G    kpi#P#dump20190513.ibd
3.3G    kpi#P#dump20190520.ibd
3.3G    kpi#P#dump20190527.ibd
3.3G    kpi#P#dump20190603.ibd
3.3G    kpi#P#dump20190610.ibd
3.4G    kpi#P#dump20190617.ibd
3.5G    kpi#P#dump20190704.ibd
3.8G    kpi#P#dump20190708.ibd
3.5G    kpi#P#dump20190717.ibd
3.5G    kpi#P#dump20190722.ibd
3.5G    kpi#P#dump20190729.ibd
3.5G    kpi#P#dump20190805.ibd
3.5G    kpi#P#dump20190812.ibd
3.6G    kpi#P#dump20190819.ibd
3.6G    kpi#P#dump20190826.ibd
3.6G    kpi#P#dump20190902.ibd
3.5G    kpi#P#dump20190909.ibd
3.6G    kpi#P#dump20190916.ibd
3.6G    kpi#P#dump20190923.ibd
3.6G    kpi#P#dump20190930.ibd
3.6G    kpi#P#dump20191111.ibd
3.7G    kpi#P#dump20191118.ibd
3.7G    kpi#P#dump20191125.ibd
3.7G    kpi#P#dump20191202.ibd
3.7G    kpi#P#dump20191209.ibd
3.7G    kpi#P#dump20191216.ibd
3.7G    kpi#P#dump20200106.ibd
3.7G    kpi#P#dump20200113.ibd
2.2G    kpi#P#dump20200127.ibd
3.8G    kpi#P#dump20200203.ibd
3.9G    kpi#P#dump20200210.ibd
3.9G    kpi#P#dump20200217.ibd
3.9G    kpi#P#dump20200224.ibd
3.9G    kpi#P#dump20200302.ibd
3.9G    kpi#P#dump20200309.ibd
4.0G    kpi#P#dump20200330.ibd
4.0G    kpi#P#dump20200406.ibd
4.0G    kpi#P#dump20200413.ibd
4.0G    kpi#P#dump20200420.ibd
4.0G    kpi#P#dump20200427.ibd
4.0G    kpi#P#dump20200504.ibd
4.2G    kpi#P#dump20200511.ibd
4.1G    kpi#P#dump20200518.ibd
4.1G    kpi#P#dump20200525.ibd
4.2G    kpi#P#dump20200601.ibd
4.2G    kpi#P#dump20200608.ibd
4.2G    kpi#P#dump20200615.ibd
4.2G    kpi#P#dump20200622.ibd
4.2G    kpi#P#dump20200629.ibd
4.3G    kpi#P#dump20200706.ibd
4.3G    kpi#P#dump20200713.ibd
4.3G    kpi#P#dump20200720.ibd
4.3G    kpi#P#dump20200727.ibd
4.4G    kpi#P#dump20200803.ibd
4.4G    kpi#P#dump20200810.ibd
4.4G    kpi#P#dump20200817.ibd
4.4G    kpi#P#dump20200824.ibd
4.4G    kpi#P#dump20200831.ibd
4.4G    kpi#P#dump20200914.ibd
4.5G    kpi#P#dump20200921.ibd
20K     kpi.frm
12K     label.frm
13G     label.ibd
12K     occupation.frm
128K    occupation.ibd
12K     occupation_parent.frm
11M     occupation_parent.ibd
12K     project.frm
176K    project.ibd
12K     year.frm
224K    year.ibd

For reference, the WHGI snapshot data.

maximilianklein@whgi:~/snapshot_data$ du -sh .
75G     .
maximilianklein@whgi:~/snapshot_data$ du -sh *
148M    2014-09-17
150M    2014-10-13
167M    2015-07-28
171M    2015-08-03
197M    2015-08-09
197M    2015-08-12
197M    2015-08-16
198M    2015-08-21
197M    2015-08-23
198M    2015-09-06
...
541M    2020-07-28
544M    2020-08-04
549M    2020-08-18
549M    2020-08-25
553M    2020-09-08
553M    2020-09-14
561M    2020-09-22
565M    2020-09-29
574M    2020-10-20
575M    2020-10-27

@Envlh I have made some calculations about how much space we may need in the future here. If you could please check the assumptions, that'd be good.

https://docs.google.com/spreadsheets/d/1jSNl1Ju_94TBBXCi6TojypdjXPdRuwLfBLCITTVS3P8/edit?usp=sharing

The final numbers are:

scenariofuture yearsraw table history keptTB
Maximum5 yearsfull (5 years)3.2
Minimum3 years1 year window1.45
notconfusing lowered the priority of this task from High to Medium.Dec 13 2020, 6:10 PM
notconfusing raised the priority of this task from Medium to High.