Page MenuHomePhabricator

Defragment db1070, db1082, db1087, db1092
Closed, ResolvedPublic

Description

New s5 database servers have been copied from the old s5 master (db1070). This host used to be configured with no innodb_file_per_table config. They have a huge ibadata file and needs defragmenting.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2016-09-21T11:04:07Z] <marostegui> Rebuilding tables in db1082 (non pooled) - T137191

After the testing with db1082, these are the real sizes I got after all the alters:

root@db1082:/srv/sqldata# du -hd 1
253G    ./dewiki
66M     ./mysql
518G    ./wikidatawiki
120K    ./ops
16K     ./heartbeat
220K    ./performance_schema
2.1T    .

I wouldn't use this host as a source because of two reasons:

  1. wikidatawiki and dewiki had duplicate entry errors in two tables (which could be because of the crashes it had - see point #2):

wikidatawiki:

***wb_items_per_site***
ERROR 1062 (23000) at line 1: Duplicate entry '564494155' for key 'PRIMARY'
***tag_summary***
ERROR 1062 (23000) at line 1: Duplicate entry '396024680' for key 'tag_summary_rc_id'

dewiki

***user_properties***
ERROR 1062 (23000) at line 1: Duplicate entry '2498892-echo-subscriptions-web-reverted' for key 'user_properties_user_property'
  1. This server had hardware issues a few days ago.

So maybe db1070 or db1071 can be better candidates. I will depool one of them and see how the ALTER tables go and if there is any errors.
If not, I will reclone db1082 as it is currently out depooled but has powerful hardware.

Can you clarify the duplicate keys issue? when did you get that, doing what?

I was testing the following ALTER

alter table $i engine=INNODB,FORCE;

The following table complained:

       Table: user_properties
Create Table: CREATE TABLE `user_properties` (
  `up_user` int(11) NOT NULL DEFAULT '0',
  `up_property` varbinary(255) DEFAULT NULL,
  `up_value` blob,
  UNIQUE KEY `user_properties_user_property` (`up_user`,`up_property`),
  KEY `user_properties_property` (`up_property`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
ERROR 1062 (23000) at line 1: Duplicate entry '2498892-echo-subscriptions-web-reverted' for key 'user_properties_user_property'

This table does not have a PK

Let's check with barracuda compressed and innodb_long_prefix. And the table should have primary key, (up_user, up_property) does not work because one is null; we should report that as a bug.

Also, let's do a full shard check T104459 before doing more operations.

user_properties should be a small-ish table, we could literally export it on csv and perform a diff, as pt-table-checksum will not work here. Also let's go for the logical reimport path, which should be more reliable.

Sounds like a safe idea. I will check the other 2-3 tables that failed too and will report back here.

Also, testing barracuda and innodb_long_prefix sounds good to me.

Taking a mysqldump from db1070 and sending it to dbstore1001 now.

Marostegui raised the priority of this task from Low to Medium.Mar 13 2017, 10:21 AM
Marostegui moved this task from Backlog to In progress on the DBA board.

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1070.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703140710_marostegui_17843.log.

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1070.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703140750_marostegui_31193.log.

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1070.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703140808_marostegui_5730.log.

After some troubleshooting as the server wasn't getting reimaged I found that db1070 is suffering this:

Error: Unable to establish IPMI v2 / RMCP+ session

And it is indeed listed here: T150160#2951190

So I will need to perform a manual installation

db1070 has been manually reimaged and it is now getting the mysqldump from yesteday imported back:

root@db1070:/srv/sqldata/dewiki# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.7 (jessie)
Release:	8.7
Codename:	jessie
root@db1070:/srv/sqldata/dewiki# dpkg -l | grep wmf-maria
ii  wmf-mariadb10                  10.0.29-2                  amd64        MariaDB plus patches.
root@db1070:/srv/sqldata/dewiki# ls -lh | grep ibd
-rw-rw---- 1 mysql mysql 112K Mar 14 09:04 abuse_filter_action.ibd
-rw-rw---- 1 mysql mysql  19M Mar 14 09:04 abuse_filter_history.ibd
-rw-rw---- 1 mysql mysql 336K Mar 14 09:04 abuse_filter.ibd
-rw-rw---- 1 mysql mysql 180M Mar 14 09:05 abuse_filter_log.ibd
-rw-rw---- 1 mysql mysql 1.5G Mar 14 09:06 archive.ibd
-rw-rw---- 1 mysql mysql  13M Mar 14 09:04 __wmf_checksums.ibd

After having a chat with Jaime, he correctly pointed out that we should try to load in parallel to avoid a week of importing for this host.
So what I have done is:

  • split the file into databases:
csplit  -s -fdatabase db1070.sql "/^USE /" {*}
  • Once that generated a file per database, split those files into multiple files, one per table for the two big databases (dewiki and wikidata)
csplit -s -fdewiki database01 "/-- Table structure for table/" {*}

I am right now importing dewiki after importing all the other old databases (hearbeat,ops, mysql etc)

db1070 keeps importing stuff, in parallel (main and biggest tables first) so far:

root@db1070:/srv/sqldata# du -sh dewiki/ wikidatawiki/
248G	dewiki/
78G	wikidatawiki/

db1070 status update:

dewiki has been 100% over night.
wikidatawiki has only 3 tables pending which have been since yesterday already: revision, logging, wb_terms

root@db1070:/srv/sqldata/wikidatawiki# ls -lh revision.ibd logging.ibd wb_terms.ibd
-rw-rw---- 1 mysql mysql 136G Mar 15 09:37 logging.ibd
-rw-rw---- 1 mysql mysql 122G Mar 15 09:37 revision.ibd
-rw-rw---- 1 mysql mysql  96G Mar 15 09:37 wb_terms.ibd

Next week I will backup db1082, db1087 and db1092, reimage and reclone them from db1070 as it is now file per table (T157931)

Change 343824 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/343824

Change 343824 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/343824

Mentioned in SAL (#wikimedia-operations) [2017-03-21T07:35:04Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1092 - T137191 (duration: 00m 42s)

Mentioned in SAL (#wikimedia-operations) [2017-03-21T07:36:20Z] <marostegui> Stop mysql db1092 for maintenance - T137191

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1092.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703210923_marostegui_13034.log.

Mentioned in SAL (#wikimedia-operations) [2017-03-21T09:42:23Z] <marostegui> Stop MySQL db1070 to clone db1092 from it - T137191

Completed auto-reimage of hosts:

['db1092.eqiad.wmnet']

and were ALL successful.

Change 343892 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Repool db1092 with less weight

https://gerrit.wikimedia.org/r/343892

Change 343892 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Repool db1092 with less weight

https://gerrit.wikimedia.org/r/343892

Mentioned in SAL (#wikimedia-operations) [2017-03-21T17:09:38Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1092 with low weight - T137191 (duration: 00m 42s)

Change 343923 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase weight for db1092

https://gerrit.wikimedia.org/r/343923

Change 343923 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Increase weight for db1092

https://gerrit.wikimedia.org/r/343923

Mentioned in SAL (#wikimedia-operations) [2017-03-21T17:24:46Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1092 weight - T137191 (duration: 00m 45s)

Change 343927 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase weight db1092

https://gerrit.wikimedia.org/r/343927

Change 343927 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Increase weight db1092

https://gerrit.wikimedia.org/r/343927

Mentioned in SAL (#wikimedia-operations) [2017-03-21T17:45:11Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1092 weight - T137191 (duration: 00m 42s)

Change 343938 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase db1092 weight

https://gerrit.wikimedia.org/r/343938

Change 343938 merged by Jcrespo:
[operations/mediawiki-config] db-eqiad.php: Increase db1092 weight

https://gerrit.wikimedia.org/r/343938

Change 344080 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Restore db1092 weight

https://gerrit.wikimedia.org/r/344080

Change 344080 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Restore db1092 weight

https://gerrit.wikimedia.org/r/344080

Mentioned in SAL (#wikimedia-operations) [2017-03-22T06:33:19Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore db1092 weight - T137191 (duration: 00m 49s)

Change 344083 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Depool db1087

https://gerrit.wikimedia.org/r/344083

Change 344083 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Depool db1087

https://gerrit.wikimedia.org/r/344083

Mentioned in SAL (#wikimedia-operations) [2017-03-22T06:58:29Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1087 - T137191 (duration: 00m 43s)

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1087.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703220844_marostegui_21624.log.

Mentioned in SAL (#wikimedia-operations) [2017-03-22T08:46:17Z] <marostegui> Stop MySQL db1070 to clone db1087 from it - T137191

Completed auto-reimage of hosts:

['db1087.eqiad.wmnet']

and were ALL successful.

Change 344117 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Repooli db1087 with low weight

https://gerrit.wikimedia.org/r/344117

Change 344117 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Repooli db1087 with low weight

https://gerrit.wikimedia.org/r/344117

Mentioned in SAL (#wikimedia-operations) [2017-03-22T12:15:45Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1087 with low weight - T137191 (duration: 00m 43s)

Change 344126 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Enable db1087 in API service

https://gerrit.wikimedia.org/r/344126

Change 344126 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Enable db1087 in API service

https://gerrit.wikimedia.org/r/344126

Mentioned in SAL (#wikimedia-operations) [2017-03-22T12:48:41Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Enable db1087 for API - T137191 (duration: 00m 42s)

Change 344143 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Increase weight db1087

https://gerrit.wikimedia.org/r/344143

Change 344143 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Increase weight db1087

https://gerrit.wikimedia.org/r/344143

Mentioned in SAL (#wikimedia-operations) [2017-03-22T13:54:40Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1087 weight - T137191 (duration: 00m 47s)

Change 344150 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config] db-eqiad.php: Restore db1087 original weight

https://gerrit.wikimedia.org/r/344150

Change 344150 merged by jenkins-bot:
[operations/mediawiki-config] db-eqiad.php: Restore db1087 original weight

https://gerrit.wikimedia.org/r/344150

Mentioned in SAL (#wikimedia-operations) [2017-03-22T14:35:38Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore db1087 original weight - T137191 (duration: 00m 44s)

Change 344306 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1082

https://gerrit.wikimedia.org/r/344306

Change 344306 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1082

https://gerrit.wikimedia.org/r/344306

Mentioned in SAL (#wikimedia-operations) [2017-03-23T06:52:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1082 - T137191 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-03-23T08:29:36Z] <marostegui> Stop db1070 MySQL db1070 for maintenance - T137191

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1082.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703230944_marostegui_11785.log.

Change 344337 had a related patch set uploaded (by Marostegui):
[operations/puppet@production] netboot.cfg: Format db1082

https://gerrit.wikimedia.org/r/344337

Change 344337 merged by Marostegui:
[operations/puppet@production] netboot.cfg: Format db1082

https://gerrit.wikimedia.org/r/344337

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1082.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703231024_marostegui_18418.log.

Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db1082.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703231037_marostegui_31446.log.

Completed auto-reimage of hosts:

['db1082.eqiad.wmnet']

and were ALL successful.

Change 344385 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1082 with low weight

https://gerrit.wikimedia.org/r/344385

Change 344385 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1082 with low weight

https://gerrit.wikimedia.org/r/344385

Mentioned in SAL (#wikimedia-operations) [2017-03-23T14:03:43Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1082 with low weight - T137191 (duration: 00m 48s)

Change 344402 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase db1082 weight

https://gerrit.wikimedia.org/r/344402

Change 344402 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase db1082 weight

https://gerrit.wikimedia.org/r/344402

Mentioned in SAL (#wikimedia-operations) [2017-03-23T15:36:59Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1082 weight - T137191 (duration: 00m 43s)

Change 344413 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase weight db1082

https://gerrit.wikimedia.org/r/344413

Change 344413 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase weight db1082

https://gerrit.wikimedia.org/r/344413

Mentioned in SAL (#wikimedia-operations) [2017-03-23T16:24:37Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1082 weight - T137191 (duration: 00m 43s)

Change 344577 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Restore original weight for db1070

https://gerrit.wikimedia.org/r/344577

Change 344577 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Restore original weight for db1070

https://gerrit.wikimedia.org/r/344577

Mentioned in SAL (#wikimedia-operations) [2017-03-24T07:07:26Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore original weight for db1070, db1071 and db1082 - T137191 (duration: 00m 43s)

All these hosts have been defragmented and now using file per table.