Migrate some s4 hosts to file per table
Open, NormalPublic

Description

These "new" s4 hosts (and probably the rest too) were probably recloned from a host not using file per table and they have a huge ibdata file and a few ibd tables:

root@db1081:/srv/sqldata/commonswiki# ls -lh | grep ibd
-rw-rw---- 1 mysql mysql 9.0M Mar 22 06:53 babel.ibd
-rw-rw---- 1 mysql mysql 9.2G Mar 22 09:53 filearchive.ibd
-rw-rw---- 1 mysql mysql 128K Mar  9 21:11 linter.ibd
-rw-rw---- 1 mysql mysql  18G Mar 22 09:55 page.ibd
-rw-rw---- 1 mysql mysql 106G Mar 22 09:55 revision.ibd
-rw-rw---- 1 mysql mysql 112K Dec  8 23:50 securepoll_elections.ibd
-rw-rw---- 1 mysql mysql 8.0M Mar 22 05:49 user_groups.ibd
root@db1081:/srv/sqldata# ls -lh ibdata1
-rw-rw---- 1 mysql mysql 1.2T Mar 22 09:56 ibdata1

Servers that need to be migrated to file per table:

  • db1059
  • db1084
  • db1091
  • db1097
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 22 2017, 9:57 AM
Marostegui moved this task from Triage to Backlog on the DBA board.Mar 22 2017, 9:57 AM
Marostegui renamed this task from Defragment: db1091, db1084, db1081 to Defragment s4: db1091, db1084, db1081, d1059 and probably the rest.Mar 28 2017, 7:32 AM
Marostegui updated the task description. (Show Details)

This task is not strictly blocking: T17441 but it would be nice to get it done at some point

Change 346503 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1081

https://gerrit.wikimedia.org/r/346503

Change 346503 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1081

https://gerrit.wikimedia.org/r/346503

Mentioned in SAL (#wikimedia-operations) [2017-04-05T06:55:03Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Depool db1081 - T161088 (duration: 00m 39s)

Mentioned in SAL (#wikimedia-operations) [2017-04-05T06:55:54Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1081 - T161088 (duration: 00m 39s)

Mentioned in SAL (#wikimedia-operations) [2017-04-05T06:56:41Z] <marostegui> Stop replication on db1081 for maintenance - T161088

I have started to defragment db1081, in order to have at least have one healthy host with file per table to be able to reclone the new eqiad servers that arrived while they get racked.
Note: I will not reimage it, as that means to get 10.0.30 - better to leave the hosts we reimaged on s5 running for a few more days/weeks before we go for it.

Mentioned in SAL (#wikimedia-operations) [2017-04-06T06:02:37Z] <marostegui> Configure and start replication on db1081 after the defragment - T161088

db1081 has been defragmented and migrated to file per table. It is now catching up.

Change 346694 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1081 with low weight

https://gerrit.wikimedia.org/r/346694

Change 346694 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1081 with low weight

https://gerrit.wikimedia.org/r/346694

Mentioned in SAL (#wikimedia-operations) [2017-04-06T07:16:22Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1081 with low weight - T161088 (duration: 00m 48s)

Change 346701 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase db1081 weight

https://gerrit.wikimedia.org/r/346701

Change 346701 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase db1081 weight

https://gerrit.wikimedia.org/r/346701

Mentioned in SAL (#wikimedia-operations) [2017-04-06T07:56:25Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increae db1081 weight - T161088 (duration: 00m 39s)

Change 346707 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase db1081 weight

https://gerrit.wikimedia.org/r/346707

Change 346707 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase db1081 weight

https://gerrit.wikimedia.org/r/346707

Mentioned in SAL (#wikimedia-operations) [2017-04-06T08:43:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increae db1081 weight - T161088 (duration: 00m 39s)

Change 346721 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase db1081 weight

https://gerrit.wikimedia.org/r/346721

Change 346721 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase db1081 weight

https://gerrit.wikimedia.org/r/346721

Mentioned in SAL (#wikimedia-operations) [2017-04-06T11:27:08Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increae db1081 weight - T161088 (duration: 00m 40s)

Change 346750 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Restore db1081 original weight

https://gerrit.wikimedia.org/r/346750

Change 346750 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Restore db1081 original weight

https://gerrit.wikimedia.org/r/346750

Mentioned in SAL (#wikimedia-operations) [2017-04-06T12:28:21Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore db1081 original weight - T161088 (duration: 00m 40s)

Marostegui moved this task from Backlog to Next on the DBA board.Jul 14 2017, 7:26 AM

Servers that need to be migrated to file per table:

db1059
db1084
db1091
db1097

root@neodymium:/home/marostegui/git/software/dbtools# cumin 'db10[68,53,56,59,64,81,84,91,97].eqiad.wmnet' 'ls -lh /srv/sqldata/ibdata1'
9 hosts will be targeted:
db[1053,1056,1059,1064,1068,1081,1084,1091,1097].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(1) db1053.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 7.8G Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(1) db1081.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 1.9G Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(1) db1059.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 682G Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(1) db1064.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 46G Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(1) db1068.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 1.6G Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(1) db1097.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 1.3T Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(1) db1056.eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 6.0G Jul 24 08:11 /srv/sqldata/ibdata1
===== NODE GROUP =====
(2) db[1084,1091].eqiad.wmnet
----- OUTPUT of 'ls -lh /srv/sqldata/ibdata1' -----
-rw-rw---- 1 mysql mysql 1.2T Jul 24 08:11 /srv/sqldata/ibdata1
================
PASS |██████████████████████████████████████████████████████████████| 100% (9/9) [00:00<00:00, 23.02hosts/s]
FAIL |                                                                      |   0% (0/9) [00:00<?, ?hosts/s]
100.0% (9/9) success ratio (>= 100.0% threshold) for command: 'ls -lh /srv/sqldata/ibdata1'.
100.0% (9/9) success ratio (>= 100.0% threshold) of nodes successfully executed all commands
Marostegui renamed this task from Defragment s4: db1091, db1084, db1081, d1059 and probably the rest to Migrate some s4 hosts to file per table.Jul 24 2017, 8:17 AM
Marostegui triaged this task as Normal priority.
Marostegui updated the task description. (Show Details)

Change 374521 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1097

https://gerrit.wikimedia.org/r/374521

Change 374521 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1097

https://gerrit.wikimedia.org/r/374521

Mentioned in SAL (#wikimedia-operations) [2017-08-29T10:57:31Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1097 - T161088 (duration: 00m 43s)

Mentioned in SAL (#wikimedia-operations) [2017-08-29T11:08:13Z] <marostegui> Stop MariaDB on db1097 to migrate it to file per table - T161088

db1097 is done

Marostegui updated the task description. (Show Details)Wed, Aug 30, 7:02 AM

Change 374741 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Give more weight to db1055

https://gerrit.wikimedia.org/r/374741

Change 374741 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Give more weight to db1055

https://gerrit.wikimedia.org/r/374741