2 analytics MySQL databases need to either be removed or replaced. Analytics should plan for this, and decide what should be done.
- db1046 - m2-master
- db1047 - analytics-slave
2 analytics MySQL databases need to either be removed or replaced. Analytics should plan for this, and decide what should be done.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T134476 Decommission old coredb machines (<=db1050) | |||
Resolved | elukey | T156844 Decommission old dbstore hosts (db1046, db1047) | |||
Unknown Object (Task) | |||||
Resolved | elukey | T177405 rack and setup db1107 and db1108 |
Update on this. @luca is working on T156933, and in talking, we realized that if we get rid of the second slave (db1047), we will only have one copy of EL data older than 90 days that we keep on the slave. I think I had thought that smart purging was also done on the master, so the retained data would remain there, but this is not true.
For redundancy purposes, we should keep two slaves. We should refresh both db1046 and db1047. Sorry for the waffling on this.
If redundancy is the main reason, and not load balancing, I would suggest having the redundant server on codfw. But there is now no analytics server on codfw, so up to you.
Updating this task in light of the recent discussions. The analytics and DBA teams have been fighting a lot with disk space consumption on dbstore1002 due to too many databases replicated in there (log + wikis). We would like to proceed as following:
Mentioned in SAL (#wikimedia-operations) [2017-10-26T07:43:37Z] <marostegui> Stop MySQL on db1047 to copy data over db1108 - T177405 T156844
db1047's data has been migrated to db1108.
It is working fine now.
We are going to leave db1108 working for a few days, make sure the event logging sync script works fine and then move the analytics-slave CNAME to db1108. After that we will ping all the people that have data on db1047 to migrate stuff to the new server.
Performance looks really good!
root@db1108[log]> select count(*) from MediaViewer_10867062_15423246; +------------+ | count(*) | +------------+ | 1151169110 | +------------+ 1 row in set (3 min 0.40 sec)
All right we are ready to outline the next steps with some deadlines (tentative):
Just sent a summary of what's happening to engineering@ and analytics@, new deadlines:
- November 13th: the analytics-slave CNAME moves from db1047 to db1108 - November 20th: the log database will be dropped from dbstore1002/analytics-store together with the event-logging replication script - December 4th: shutdown of db1047 (prior backup of non-log database tables)
Does that mean it's not going to be possible to JOIN EventLogging tables with MediaWiki tables in the future? I'm not working with user-facing features these days, but when I did, that would have been pretty painful.
@Tgr
What types of selects were you doing?
We think the best place to do this type of joining is hadoop, we are working into refining EL data into hadoop (https://phabricator.wikimedia.org/T162610) and the mediawiki edit reconstructuction already pulls mediawiki tables and edit data for all wikis into the cluster.
Checking how much different user cohorts (bucketed by editcount) click on a button, for example.
If it can be done in Hadoop, that sounds great (I was doing this two years ago, I don't think Hadoop had the MediaWiki tables then). How would that work privacy-wise? Right now Hadoop is the high-security area, and MySQL is the low-security area (in terms of how many people have access), with much less sensitive data. Would there be a separate webrequest hadoop access and EL-only Hadoop access?
Would there be a separate webrequest hadoop access and EL-only Hadoop access?
It doesn't work 100% as it should, but for Hadoop access control, analytics-privatedata-users will giveyou Hadoop access AND allow you to read things like webrequest, whereas analytics-users is just Hadoop access. I think EventLogging analytics and MW data don't need privatedata level restrictions. So, yes! This should be possible now.
@Tgr : i think your use case would work in hadoop as edit data related information is available since the beginning of time, now, it is true that hadoop at this time does not give you the ability to join with "any" mediawiki table.
Change 391020 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Modify CNAME for analytics-slave from db1047 to db1108
Change 391020 merged by Elukey:
[operations/dns@master] Remove any trace of db1047 from analytics CNAMEs
Change 391062 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/golden@master] db1047 => db1108
Change 391063 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/wmf@master] db1047 => db1108
Change 391062 merged by Chelsyx:
[wikimedia/discovery/golden@master] db1047 => db1108
Change 391528 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbproxy: Add proxy 4 and 9 to reimage as stretch for upgrade
Change 391528 merged by Jcrespo:
[operations/puppet@production] dbproxy: Add proxy 4 and 9 to reimage as stretch for upgrade
Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['dbproxy1009.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201711151013_jynus_6035.log.
Completed auto-reimage of hosts:
['dbproxy1009.eqiad.wmnet']
and were ALL successful.
Change 391536 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] haproxy: Update configuration template to haproxy 1.7 syntax
Change 391540 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hieradata::regex: allow notifications for db1107, disable them for db1046
Change 391540 merged by Elukey:
[operations/puppet@production] hieradata::regex: allow notifications for db1107, disable them for db1046
Change 391536 merged by Jcrespo:
[operations/puppet@production] haproxy: Update configuration template to haproxy 1.7 syntax
Change 391542 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] haproxy: Followup to Update configuration template to haproxy 1.7 syntax
Change 391542 merged by Jcrespo:
[operations/puppet@production] haproxy: Followup to Update configuration template to haproxy 1.7 syntax
Change 391543 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] eventlogging: Point m4-master to db1107 with db1108 as backup
Change 391545 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] eventlogging: Repoint m4-master CNAME to dbproxy1009
Change 391543 merged by Jcrespo:
[operations/puppet@production] eventlogging: Point m4-master to db1107 with db1108 as backup
Change 391545 merged by Jcrespo:
[operations/dns@master] eventlogging: Repoint m4-master CNAME to dbproxy1009
I am currently reviewing what tables to drop on db1047 and which ones to copy over to db1108, and this is what I gathered from old entries of this task:
halfak, staeiou, giovanni, declerambaul, nimish, rfaulk, whym, erik, jmorgan
ops, otto_test_sync_to_delete, test
staging
dartar
diederik
shawn
zexley
Re-ping for:
ops are ours, we can handle that- just leave things as you found them.
test is probably a mistake and probably should be deleted unless you find something interesting there.
Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['dbproxy1004.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201711171416_jynus_31942.log.
Change 392044 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] haproxy: Migrate dbproxy1004 to stretch (haproxy 1.7) format
Completed auto-reimage of hosts:
['dbproxy1004.eqiad.wmnet']
and were ALL successful.
Change 392044 merged by Jcrespo:
[operations/puppet@production] haproxy: Migrate dbproxy1004 to stretch (haproxy 1.7) format
Re @elukey
Seems to me that the DROP DATABASE list is correct.
To the list of databases to review I would add: akhanna (https://wikimediafoundation.org/wiki/User:Akhanna)
Also, I think zexley corresponds to: https://fr.wikipedia.org/wiki/Utilisateur:ZExley_(WMF)
Could not find who shawn corresponds to...
I just spoke to @JAllemandou, we can help review these legacy tables early next week.
Mentioned in SAL (#wikimedia-operations) [2017-11-20T08:46:03Z] <marostegui> Run mydumper for db1047.staging - T156844
The data behind Page Creation Dashboard is configured to read data from the log database on dbstore1002. Can I at this point submit a patch to the ReportUpdater configuration that updates it to use db1108.eqiad.wmnet, as that now has the updated log database?
Yes correct, or you can use the domain analytics-slave.eqiad.wmnet (that is a CNAME to db1108 :)
Nevermind, turns out @mforns has already updated that configuration, should've checked that first. Thanks again for taking care of it!
@Nettrom Right, is not only that dashboard but all the ones that are feed data via reportupdater that needed the new configuration
This is a list of tables for all the above databases, in case somebody wants to quickly glance over them:
========= dartar Table Size in MB clean_patrol_log 15.52 pages_reviewed 2.52 temp 0.00 ========= diederik Table Size in MB archive 4609.74 logging 4532.85 sandbox 54.55 gender 7.86 ========= shawn Table Size in MB categorylinks 2624.42 user_meta 212.50 hlp_user_meta 111.10 yr_hlp_user_meta 35.10 wtemplates 21.89 wp_members 4.65 wp_editors 2.81 wp_members_categorylink 2.74 wp_members_section 2.31 wp_members_memberpage 1.50 categorylinks_wp 0.19 wikiproject_editors 0.19 wikiprojects_member_pages 0.11 wp_page_count 0.09 WikiProject_Pages 0.08 w_users 0.00 ========= zexley Table Size in MB suspectboteditsamp 3099.15 suspectbotrevs 1823.20 3ers 1196.98 user_meta_qs 443.50 user_qs 316.53 user_meta_qs_archive 187.50 ncc 152.18 no_deleted_edits 130.02 crazy_article_edits 3.80 3ersMeta12 2.40 lass_rev 1.40 lass_catlinks 0.59 boteditsmonthlybybot 0.37 fwjru 0.35 lass_page 0.23 lass_category 0.11 lass_archive 0.09 crazy_editors 0.07 bots 0.05 ncu 0.03 nc 0.01 try 0.01 boteditsmonthly 0.00 test2 0.00 test 0.00 huggle_monthly 0.00
diederik and zexley seem to be previous employee of the WMF, we didn't find info about shawn and dartar should be @DarTar.
If anybody knows anything about the above tables please speak up. I am going to re-check with the research team.
Change 393238 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Drop the Eventlogging support for dbstore1002
The shawn table belonged to Shawn Walker, a research intern in 2011. These tables can be safely deleted.
Mentioned in SAL (#wikimedia-operations) [2017-11-27T13:22:33Z] <elukey> remove eventlogging replication support (log database) from dbstore1002 - T156844
Change 393238 merged by Elukey:
[operations/puppet@production] Drop the Eventlogging support for dbstore1002
Change 393597 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Delete role::mariadb::analytics
Mentioned in SAL (#wikimedia-operations) [2017-11-28T09:04:36Z] <marostegui> Drop database log from dbstore1002 - T156844
root@dbstore1002:~# mysql --skip-ssl Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 56286751 Server version: 10.0.22-MariaDB MariaDB Server Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. root@DBSTORE[(none)]> set session sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) root@DBSTORE[(none)]> drop database if exists log; Query OK, 379 rows affected (9.01 sec)
Change 393597 merged by Elukey:
[operations/puppet@production] Delete role::mariadb::analytics
Thanks!
I dumped anyway the dbs dartar, diederik, shawn and zexley on my home directory on stat1006:
elukey@stat1006:/srv/home/elukey$ du -hs * 15M db1047_dartar_backup.sql 11G db1047_diederik_backup.sql 920M db1047_shawn_backup.sql 7.3G db1047_zexley_backup.sql
This should be enough to unblock the decom process for db104[67].
Change 394318 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] m4.hosts: Remove db1046 and db1047
Change 394318 merged by jenkins-bot:
[operations/software@master] m4.hosts: Remove db1046 and db1047
Change 394326 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-eventlogging: db1108 will contain only log database
Change 394326 merged by Jcrespo:
[operations/puppet@production] mariadb-eventlogging: db1108 will contain only log database
Change 394332 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-eventlogging: Move prometheus monitoring to /run socket
Change 394332 merged by Jcrespo:
[operations/puppet@production] mariadb-eventlogging: Move prometheus monitoring to /run socket
Change 394531 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set role::system::spare to db104[67]
Mentioned in SAL (#wikimedia-operations) [2017-12-01T08:41:23Z] <marostegui> Remove db1046 and db1047 from tendril - T156844
Change 394531 merged by Elukey:
[operations/puppet@production] Set role::system::spare to db104[67]
As per our chat, closing this.
Thanks for all the hard work you've put to make this happen!