⚓ T199861 Decommission db1052

Subject	Repo	Branch	Lines +/-
Removing dns entries decom host db1052	operations/dns	master	+1 -5
Removing db1052 from site.pp final decommission	operations/puppet	production	+0 -6
db-eqiad,db-codfw.php: Remove db1052	operations/mediawiki-config	master	+0 -3
mariadb: Set db1052 to spare	operations/puppet	production	+7 -12
db-eqiad.php: Depool db1089	operations/mediawiki-config	master	+1 -1
db1089: Change binlog to ROW	operations/puppet	production	+0 -1
db1052: Disable notifications, upgrade socket	operations/puppet	production	+1 -1

		Status	Subtype	Assigned	Task
		Resolved		None	T186320 Decommission db1051-db1060 (DBA tracking)
		Resolved		RobH	T199861 Decommission db1052

Marostegui triaged this task as Medium priority.Jul 18 2018, 6:36 AM

Marostegui created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 18 2018, 6:36 AM

I thought a bit about how to go over this, and given the importance and history of this host, this would be one proposal, see what you think about it:

Wait 1 week to make sure we are not going to fail back immediately
Archive and compress a tarball on the database hosts just in case for e.g. 3 months

In T199861#4432834, @jcrespo wrote:

I thought a bit about how to go over this, and given the importance and history of this host, this would be one proposal, see what you think about it:

Wait 1 week to make sure we are not going to fail back immediately

Yeah, my idea was to even wait till 31st July - after the network maintenance.

Archive and compress a tarball on the database hosts just in case for e.g. 3 months

Agreed!

Marostegui mentioned this in T197069: Failover db1052 (s1) db primary master.Jul 18 2018, 6:41 AM

Change 446533 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1052: Disable notifications, upgrade socket

https://gerrit.wikimedia.org/r/446533

gerritbot added a project: Patch-For-Review.Jul 18 2018, 6:47 AM

Marostegui claimed this task.Jul 18 2018, 6:57 AM

Marostegui moved this task from Triage to Pending comment on the DBA board.

Change 446533 merged by Marostegui:
[operations/puppet@production] db1052: Disable notifications, upgrade socket

https://gerrit.wikimedia.org/r/446533

Marostegui updated the task description. (Show Details)Jul 31 2018, 1:48 PM

Change 449652 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1089: Change binlog to ROW

https://gerrit.wikimedia.org/r/449652

Change 449652 merged by Marostegui:
[operations/puppet@production] db1089: Change binlog to ROW

https://gerrit.wikimedia.org/r/449652

Mentioned in SAL (#wikimedia-operations) [2018-08-01T04:47:51Z] <marostegui> Stop MySQL on db1052 to copy its content to dbstore1001 - https://phabricator.wikimedia.org/T199861

Change 449653 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1089

https://gerrit.wikimedia.org/r/449653

Change 449653 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1089

https://gerrit.wikimedia.org/r/449653

Marostegui updated the task description. (Show Details)Aug 1 2018, 5:18 AM

db1052's content has been copied to dbstore1001:/srv/backups/tmp/db1052
For the record, these are the coordinates after the stop:

root@PRODUCTION s1 master[(none)]> show slave status\G show master status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: db1067.eqiad.wmnet
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: db1067-bin.001325
          Read_Master_Log_Pos: 712219114
               Relay_Log_File: db1052-relay-bin.000158
                Relay_Log_Pos: 712219402
        Relay_Master_Log_File: db1067-bin.001325
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 712219114
              Relay_Log_Space: 712219744
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 171974720
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 0-171970637-5484646134,171974720-171974720-88503795,171970637-171970637-2116621969,171978774-171978774-5,180359172-180359172-49702203
1 row in set (0.00 sec)

*************************** 1. row ***************************
            File: db1052-bin.005999
        Position: 323268603
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

Marostegui updated the task description. (Show Details)Aug 1 2018, 7:00 AM

Change 449665 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1052

https://gerrit.wikimedia.org/r/449665

Marostegui updated the task description. (Show Details)Aug 1 2018, 7:09 AM

Mentioned in SAL (#wikimedia-operations) [2018-08-01T07:09:48Z] <marostegui> Remove db1052 from tendril - T199861

Change 449665 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1052

https://gerrit.wikimedia.org/r/449665

Mentioned in SAL (#wikimedia-operations) [2018-08-01T07:11:32Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db1052 from config as it will be decommissioned - T199861 (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2018-08-01T07:12:35Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db1052 from config as it will be decommissioned - T199861 (duration: 00m 55s)

Change 449666 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Set db1052 to spare

https://gerrit.wikimedia.org/r/449666

Change 449666 merged by Marostegui:
[operations/puppet@production] mariadb: Set db1052 to spare

https://gerrit.wikimedia.org/r/449666

Marostegui updated the task description. (Show Details)Aug 1 2018, 7:17 AM

db1052 is now ready for DCOps to finish its decommissioning - assigning it to @RobH
db1052 was a great s1 master but now it needs some rest!! :-)

Restricted Application added a project: SRE. · View Herald TranscriptAug 1 2018, 7:23 AM

• Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Aug 1 2018, 2:32 PM

Change 452385 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Removing db1052 from site.pp final decommission

https://gerrit.wikimedia.org/r/452385

Change 452385 merged by Cmjohnson:
[operations/puppet@production] Removing db1052 from site.pp final decommission

https://gerrit.wikimedia.org/r/452385

Change 452394 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing dns entries decom host db1052

https://gerrit.wikimedia.org/r/452394

Change 452394 merged by Cmjohnson:
[operations/dns@master] Removing dns entries decom host db1052

https://gerrit.wikimedia.org/r/452394

• Cmjohnson updated the task description. (Show Details)Aug 13 2018, 3:18 PM

• Cmjohnson moved this task from Decommission to UnRacking Tasks on the ops-eqiad board.

• Cmjohnson closed this task as Resolved.Aug 21 2018, 5:05 PM

• Cmjohnson updated the task description. (Show Details)