Page MenuHomePhabricator

Setup dbstore2002 with 2 new mysql instances from production and enable GTID
Closed, ResolvedPublic

Description

dbstore2002 is currently running the following replication threads:

  • s1
  • s3
  • s4
  • s5

Its current disk usage is:

Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   6.6T  4.4T  2.2T  67% /srv

There is not much space left to import two production shards, but we can compress and import s2 (which compressed is around 800GB - T150438#2975562) and x1 (which is around 230G - without compression).

The idea would be to leave the MySQL instance that runs multi-source up and running and create 2 more instances, one for s2 and one for x1.

Details

Related Gerrit Patches:
operations/puppet : productionmariadb-multiinstance: Make the main multisource 3306 instance available
operations/puppet : productionmariadb: Transform dbstore2002 into multi-instance, drop db1096
operations/puppet : productiondb2033.yaml: Use the new socket location
operations/software : masters2.hosts: Add dbstore2002 port 3312
operations/mediawiki-config : masterdb-codfw.php: Depool db2056
operations/mediawiki-config : masterdb-codfw.php: Add status for db2049

Event Timeline

jcrespo created this task.Jul 3 2017, 10:31 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 3 2017, 10:31 AM
Marostegui triaged this task as Medium priority.Jul 3 2017, 10:31 AM
Marostegui moved this task from Triage to Next on the DBA board.Jul 3 2017, 10:32 AM

Change 362978 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Add status for db2049

https://gerrit.wikimedia.org/r/362978

Marostegui updated the task description. (Show Details)Jul 3 2017, 12:31 PM

Change 362978 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Add status for db2049

https://gerrit.wikimedia.org/r/362978

Mentioned in SAL (#wikimedia-operations) [2017-07-03T12:36:38Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add comments about db2056 status - T169510 (duration: 02m 50s)

Mentioned in SAL (#wikimedia-operations) [2017-07-03T12:39:59Z] <marostegui> Compress innodb on db2056 - T169510

Marostegui moved this task from Next to In progress on the DBA board.Jul 4 2017, 12:45 PM

Mentioned in SAL (#wikimedia-operations) [2017-07-06T05:16:09Z] <marostegui> Stop mysql on db2056 for maintenance - T148507 T169510

Mentioned in SAL (#wikimedia-operations) [2017-07-06T07:11:11Z] <marostegui> Disable puppet on dbstore2002 - T169510

Mentioned in SAL (#wikimedia-operations) [2017-07-06T07:15:28Z] <marostegui> Stop MySQL on dbstore2002 for maintenance - T169510

Change 363594 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2056

https://gerrit.wikimedia.org/r/363594

Change 363594 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2056

https://gerrit.wikimedia.org/r/363594

Mentioned in SAL (#wikimedia-operations) [2017-07-06T13:27:03Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Depool db2056 - T169510 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-07-06T13:28:45Z] <marostegui> Stop MySQL on db2056 for maintenance - T169510

Mentioned in SAL (#wikimedia-operations) [2017-07-07T07:39:02Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Repool db2056 - T169510 (duration: 00m 43s)

dbstore2002 is now running two instances.
The general one with s1,s3, s4 and s5 and another instance on port 3312 with s2 (compressed)

This is obviously done in a manual way so far, as puppet code is still nicely being worked on :-)
I might import x1 as well as we still have plenty of space for it:

root@dbstore2002:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   6.6T  4.2T  2.4T  65% /srv

Change 363827 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s2.hosts: Add dbstore2002 port 3312

https://gerrit.wikimedia.org/r/363827

Change 363827 merged by jenkins-bot:
[operations/software@master] s2.hosts: Add dbstore2002 port 3312

https://gerrit.wikimedia.org/r/363827

Change 364665 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2033.yaml: Use the new socket location

https://gerrit.wikimedia.org/r/364665

Mentioned in SAL (#wikimedia-operations) [2017-07-12T08:11:13Z] <marostegui> Stop MySQL on db2033 (x1) - T169510

Change 364665 merged by Marostegui:
[operations/puppet@production] db2033.yaml: Use the new socket location

https://gerrit.wikimedia.org/r/364665

Change 364681 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Transform dbstore2002 into multi-instance

https://gerrit.wikimedia.org/r/364681

x1 has been imported on dbstore2002 and it is up and replicating.

Change 364681 merged by Jcrespo:
[operations/puppet@production] mariadb: Transform dbstore2002 into multi-instance, drop db1096

https://gerrit.wikimedia.org/r/364681

So right now the separate instances are puppetized, the main multi-source one isn't. To handle it:

systemctl set-environment MYSQLD_OPTS="--datadir=/srv/sqldata --tmpdir=/srv/tmp --socket=/run/mysqld/mysqld.sock --port=3306 --skip-slave-start"
systemctl <command> mariadb
systemctl unset-environment MYSQLD_OPTS

What do you think should be the plan, populate with as many intances as possible and delete the temporary central-multisource, keep it like that for some weeks?

Change 364744 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-multiinstance: Make the main multisource 3306 instance available

https://gerrit.wikimedia.org/r/364744

So right now the separate instances are puppetized, the main multi-source one isn't. To handle it:

systemctl set-environment MYSQLD_OPTS="--datadir=/srv/sqldata --tmpdir=/srv/tmp --socket=/run/mysqld/mysqld.sock --port=3306 --skip-slave-start"
systemctl <command> mariadb
systemctl unset-environment MYSQLD_OPTS

What do you think should be the plan, populate with as many intances as possible and delete the temporary central-multisource, keep it like that for some weeks?

As soon as you are happy with how it goes, I would go for: delete all the data and start populating it with as many instances (compressed) as possible

Mentioned in SAL (#wikimedia-operations) [2017-07-12T14:55:53Z] <marostegui> Run redact_sanitarium on db1069 and db1095 for maiwikimedia - T169510

^ that is not for this ticket, sorry!

Change 364744 merged by Jcrespo:
[operations/puppet@production] mariadb-multiinstance: Make the main multisource 3306 instance available

https://gerrit.wikimedia.org/r/364744

jcrespo closed this task as Resolved.Jul 24 2017, 6:30 AM

GTID was enabled last week- there is some merit to ask if we should share the same server and domain id, but for now it works. Closing as x1, s1 and s2 are running there with no issues. Continuing work on T171321.