Page MenuHomePhabricator

Point labsdb1001 and labsdb1003 to db1095 and db1102
Closed, ResolvedPublic

Description

The following shards are available on db1095 now:

  • s1
  • s3
  • s4
  • s5

The following shards are on db1102

  • s2
  • s6
  • s7

We should point labsdb1001 and labsdb1003 to db1095 and db1102 for those replication channels (and keep them idempotent).

Once that is done we can start cleaning up db1069.

Details

Related Gerrit Patches:
operations/puppet : productioninstall_server: Allow reimage of db1069, dbstore2001
operations/mediawiki-config : mastermariadb: Depool db1065, db1064, db1070 for maintenance
operations/mediawiki-config : mastermariadb: Depool db1079 for maintenance
operations/mediawiki-config : mastermariadb: Depool db1085 temporarilly for maintenance
operations/mediawiki-config : mastermariadb: temporarely depooling db1060 for maintenance

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 30 2017, 9:25 AM
Marostegui triaged this task as Medium priority.May 30 2017, 9:25 AM
Marostegui moved this task from Triage to Next on the DBA board.
Marostegui updated the task description. (Show Details)Jun 12 2017, 1:24 PM
Marostegui moved this task from Next to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2017-06-26T07:55:25Z] <marostegui> Stop replication on db1069:3313 (s3) and db1044 in the same position - T166546

I just realised that changing the master from labsdb1001 and labsdb1003 for the above threads, need to be done all at the same time, otherwise it will break
Why?

Because db1095 has a binlog, which gets all the writes coming from all its replication threads (s1, s3, s4 and s5)

root@db1095[(none)]> show master status\G
*************************** 1. row ***************************
            File: db1095-bin.001729
        Position: 736187777
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

Meaning that if we change just one shard, it will keep applying the writes coming from that single binlog (as we do not have specific per-thread replication filters) and it will leave the data in a weird state :-)
This is not a problem per se, but I just wanted to change one shard first to test and continuing with the others.

Marostegui renamed this task from Point labsdb1001 and labsdb1003 to db1095 to Point labsdb1001 and labsdb1003 to db1095 and db1102.Aug 7 2017, 11:49 AM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Aug 7 2017, 11:51 AM
jcrespo claimed this task.Aug 8 2017, 1:55 PM

Change 370640 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: temporarely depooling db1060 for maintenance

https://gerrit.wikimedia.org/r/370640

I was going to move s2, and then T172784 happened. Will try tomorrow.

Change 370640 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: temporarely depooling db1060 for maintenance

https://gerrit.wikimedia.org/r/370640

jcrespo updated the task description. (Show Details)Aug 9 2017, 5:11 AM

Change 370780 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1085 temporarilly for maintenance

https://gerrit.wikimedia.org/r/370780

Change 370780 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1085 temporarilly for maintenance

https://gerrit.wikimedia.org/r/370780

jcrespo updated the task description. (Show Details)Aug 9 2017, 5:53 AM

Change 370782 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1079 for maintenance

https://gerrit.wikimedia.org/r/370782

Change 370782 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1079 for maintenance

https://gerrit.wikimedia.org/r/370782

jcrespo updated the task description. (Show Details)Aug 9 2017, 6:20 AM

Change 370784 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1065, db1064, db1070 for maintenance

https://gerrit.wikimedia.org/r/370784

Change 370784 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1065, db1064, db1070 for maintenance

https://gerrit.wikimedia.org/r/370784

jcrespo closed this task as Resolved.Aug 9 2017, 7:38 AM
jcrespo updated the task description. (Show Details)
jcrespo removed a project: Patch-For-Review.
jcrespo moved this task from In progress to Done on the DBA board.

Replication no longer going through db1069- keeping it alive and replicating for a while to detect problems and in case a revert is needed.

Change 370788 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Allow reimage of db1069, dbstore2001

https://gerrit.wikimedia.org/r/370788

Change 370788 merged by Jcrespo:
[operations/puppet@production] install_server: Allow reimage of db1069, dbstore2001

https://gerrit.wikimedia.org/r/370788

Marostegui added a comment.EditedAug 9 2017, 9:11 AM

Nice!Finally db1069 is going to be unused!