New s5 database servers have been copied from the old s5 master (db1070). This host used to be configured with no innodb_file_per_table config. They have a huge ibadata file and needs defragmenting.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Marostegui | T137191 Defragment db1070, db1082, db1087, db1092 | |||
Resolved | • Cmjohnson | T160392 Reset db1070 idrac | |||
Resolved | • Cmjohnson | T158969 Degraded RAID on db1070 | |||
Resolved | Marostegui | T157931 s5: db1070 not using file per table |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2016-09-21T11:04:07Z] <marostegui> Rebuilding tables in db1082 (non pooled) - T137191
After the testing with db1082, these are the real sizes I got after all the alters:
root@db1082:/srv/sqldata# du -hd 1 253G ./dewiki 66M ./mysql 518G ./wikidatawiki 120K ./ops 16K ./heartbeat 220K ./performance_schema 2.1T .
I wouldn't use this host as a source because of two reasons:
- wikidatawiki and dewiki had duplicate entry errors in two tables (which could be because of the crashes it had - see point #2):
wikidatawiki:
***wb_items_per_site*** ERROR 1062 (23000) at line 1: Duplicate entry '564494155' for key 'PRIMARY' ***tag_summary*** ERROR 1062 (23000) at line 1: Duplicate entry '396024680' for key 'tag_summary_rc_id'
dewiki
***user_properties*** ERROR 1062 (23000) at line 1: Duplicate entry '2498892-echo-subscriptions-web-reverted' for key 'user_properties_user_property'
- This server had hardware issues a few days ago.
So maybe db1070 or db1071 can be better candidates. I will depool one of them and see how the ALTER tables go and if there is any errors.
If not, I will reclone db1082 as it is currently out depooled but has powerful hardware.
I was testing the following ALTER
alter table $i engine=INNODB,FORCE;
The following table complained:
Table: user_properties Create Table: CREATE TABLE `user_properties` ( `up_user` int(11) NOT NULL DEFAULT '0', `up_property` varbinary(255) DEFAULT NULL, `up_value` blob, UNIQUE KEY `user_properties_user_property` (`up_user`,`up_property`), KEY `user_properties_property` (`up_property`) ) ENGINE=InnoDB DEFAULT CHARSET=binary
ERROR 1062 (23000) at line 1: Duplicate entry '2498892-echo-subscriptions-web-reverted' for key 'user_properties_user_property'
This table does not have a PK
Let's check with barracuda compressed and innodb_long_prefix. And the table should have primary key, (up_user, up_property) does not work because one is null; we should report that as a bug.
Also, let's do a full shard check T104459 before doing more operations.
user_properties should be a small-ish table, we could literally export it on csv and perform a diff, as pt-table-checksum will not work here. Also let's go for the logical reimport path, which should be more reliable.
Sounds like a safe idea. I will check the other 2-3 tables that failed too and will report back here.
Also, testing barracuda and innodb_long_prefix sounds good to me.
Maybe we can try https://github.com/giacomolozito/ibdata-shrinker for defragmenting one of this hosts.
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1070.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703140710_marostegui_17843.log.
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1070.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703140750_marostegui_31193.log.
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1070.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703140808_marostegui_5730.log.
After some troubleshooting as the server wasn't getting reimaged I found that db1070 is suffering this:
Error: Unable to establish IPMI v2 / RMCP+ session
And it is indeed listed here: T150160#2951190
So I will need to perform a manual installation
db1070 has been manually reimaged and it is now getting the mysqldump from yesteday imported back:
root@db1070:/srv/sqldata/dewiki# lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 8.7 (jessie) Release: 8.7 Codename: jessie root@db1070:/srv/sqldata/dewiki# dpkg -l | grep wmf-maria ii wmf-mariadb10 10.0.29-2 amd64 MariaDB plus patches.
root@db1070:/srv/sqldata/dewiki# ls -lh | grep ibd -rw-rw---- 1 mysql mysql 112K Mar 14 09:04 abuse_filter_action.ibd -rw-rw---- 1 mysql mysql 19M Mar 14 09:04 abuse_filter_history.ibd -rw-rw---- 1 mysql mysql 336K Mar 14 09:04 abuse_filter.ibd -rw-rw---- 1 mysql mysql 180M Mar 14 09:05 abuse_filter_log.ibd -rw-rw---- 1 mysql mysql 1.5G Mar 14 09:06 archive.ibd -rw-rw---- 1 mysql mysql 13M Mar 14 09:04 __wmf_checksums.ibd
After having a chat with Jaime, he correctly pointed out that we should try to load in parallel to avoid a week of importing for this host.
So what I have done is:
- split the file into databases:
csplit -s -fdatabase db1070.sql "/^USE /" {*}
- Once that generated a file per database, split those files into multiple files, one per table for the two big databases (dewiki and wikidata)
csplit -s -fdewiki database01 "/-- Table structure for table/" {*}
I am right now importing dewiki after importing all the other old databases (hearbeat,ops, mysql etc)
db1070 keeps importing stuff, in parallel (main and biggest tables first) so far:
root@db1070:/srv/sqldata# du -sh dewiki/ wikidatawiki/ 248G dewiki/ 78G wikidatawiki/
db1070 status update:
dewiki has been 100% over night.
wikidatawiki has only 3 tables pending which have been since yesterday already: revision, logging, wb_terms
root@db1070:/srv/sqldata/wikidatawiki# ls -lh revision.ibd logging.ibd wb_terms.ibd -rw-rw---- 1 mysql mysql 136G Mar 15 09:37 logging.ibd -rw-rw---- 1 mysql mysql 122G Mar 15 09:37 revision.ibd -rw-rw---- 1 mysql mysql 96G Mar 15 09:37 wb_terms.ibd
Next week I will backup db1082, db1087 and db1092, reimage and reclone them from db1070 as it is now file per table (T157931)
Change 343824 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Depool db1092
Change 343824 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Depool db1092
Mentioned in SAL (#wikimedia-operations) [2017-03-21T07:35:04Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1092 - T137191 (duration: 00m 42s)
Mentioned in SAL (#wikimedia-operations) [2017-03-21T07:36:20Z] <marostegui> Stop mysql db1092 for maintenance - T137191
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1092.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703210923_marostegui_13034.log.
Mentioned in SAL (#wikimedia-operations) [2017-03-21T09:42:23Z] <marostegui> Stop MySQL db1070 to clone db1092 from it - T137191
Change 343892 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Repool db1092 with less weight
Change 343892 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Repool db1092 with less weight
Mentioned in SAL (#wikimedia-operations) [2017-03-21T17:09:38Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1092 with low weight - T137191 (duration: 00m 42s)
Change 343923 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase weight for db1092
Change 343923 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Increase weight for db1092
Mentioned in SAL (#wikimedia-operations) [2017-03-21T17:24:46Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1092 weight - T137191 (duration: 00m 45s)
Change 343927 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase weight db1092
Change 343927 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Increase weight db1092
Mentioned in SAL (#wikimedia-operations) [2017-03-21T17:45:11Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1092 weight - T137191 (duration: 00m 42s)
Change 343938 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase db1092 weight
Change 343938 merged by Jcrespo:
[operations/mediawiki-config] db-eqiad.php: Increase db1092 weight
Change 344080 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Restore db1092 weight
Change 344080 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Restore db1092 weight
Mentioned in SAL (#wikimedia-operations) [2017-03-22T06:33:19Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore db1092 weight - T137191 (duration: 00m 49s)
Change 344083 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Depool db1087
Change 344083 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Depool db1087
Mentioned in SAL (#wikimedia-operations) [2017-03-22T06:58:29Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1087 - T137191 (duration: 00m 43s)
Mentioned in SAL (#wikimedia-operations) [2017-03-22T07:05:11Z] <marostegui> Stop MySQL db1087 - T137191
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1087.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703220844_marostegui_21624.log.
Mentioned in SAL (#wikimedia-operations) [2017-03-22T08:46:17Z] <marostegui> Stop MySQL db1070 to clone db1087 from it - T137191
Change 344117 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Repooli db1087 with low weight
Change 344117 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Repooli db1087 with low weight
Mentioned in SAL (#wikimedia-operations) [2017-03-22T12:15:45Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1087 with low weight - T137191 (duration: 00m 43s)
Change 344126 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Enable db1087 in API service
Change 344126 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Enable db1087 in API service
Mentioned in SAL (#wikimedia-operations) [2017-03-22T12:48:41Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Enable db1087 for API - T137191 (duration: 00m 42s)
Change 344143 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase weight db1087
Change 344143 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Increase weight db1087
Mentioned in SAL (#wikimedia-operations) [2017-03-22T13:54:40Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1087 weight - T137191 (duration: 00m 47s)
Change 344150 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Restore db1087 original weight
Change 344150 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Restore db1087 original weight
Mentioned in SAL (#wikimedia-operations) [2017-03-22T14:35:38Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore db1087 original weight - T137191 (duration: 00m 44s)
Change 344306 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1082
Change 344306 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1082
Mentioned in SAL (#wikimedia-operations) [2017-03-23T06:52:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1082 - T137191 (duration: 00m 44s)
Mentioned in SAL (#wikimedia-operations) [2017-03-23T07:08:23Z] <marostegui> Stop MySQL db1082 for maintenance - https://phabricator.wikimedia.org/T137191
Mentioned in SAL (#wikimedia-operations) [2017-03-23T08:29:36Z] <marostegui> Stop db1070 MySQL db1070 for maintenance - T137191
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1082.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703230944_marostegui_11785.log.
Change 344337 had a related patch set uploaded (by Marostegui):
[operations/puppet@production] netboot.cfg: Format db1082
Change 344337 merged by Marostegui:
[operations/puppet@production] netboot.cfg: Format db1082
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1082.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703231024_marostegui_18418.log.
Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:
['db1082.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201703231037_marostegui_31446.log.
Change 344385 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1082 with low weight
Change 344385 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1082 with low weight
Mentioned in SAL (#wikimedia-operations) [2017-03-23T14:03:43Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1082 with low weight - T137191 (duration: 00m 48s)
Change 344402 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase db1082 weight
Change 344402 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase db1082 weight
Mentioned in SAL (#wikimedia-operations) [2017-03-23T15:36:59Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1082 weight - T137191 (duration: 00m 43s)
Change 344413 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase weight db1082
Change 344413 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase weight db1082
Mentioned in SAL (#wikimedia-operations) [2017-03-23T16:24:37Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1082 weight - T137191 (duration: 00m 43s)
Change 344577 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Restore original weight for db1070
Change 344577 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Restore original weight for db1070
Mentioned in SAL (#wikimedia-operations) [2017-03-24T07:07:26Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore original weight for db1070, db1071 and db1082 - T137191 (duration: 00m 43s)