Page MenuHomePhabricator

Decommission db1048 (was Move m3 slave to db1059)
Closed, ResolvedPublic

Description

In order to decom db1048, we need to setup a newer host with that role. We can use db1059, which is "easy" to remove from s4, and pool it as the new m3 replica.

db1059 is now in production usage, only task left is to get rid of db1048.

Event Timeline

Change 377455 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1059, pool db1097 as api with low load

https://gerrit.wikimedia.org/r/377455

Change 377455 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1059, pool db1097 as api with low load

https://gerrit.wikimedia.org/r/377455

Change 377468 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Remove all references of db1059 from mediawiki

https://gerrit.wikimedia.org/r/377468

Change 377468 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Remove all references of db1059 from mediawiki

https://gerrit.wikimedia.org/r/377468

Change 377474 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1059 from mediawiki to misc (m3)

https://gerrit.wikimedia.org/r/377474

Change 377474 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1059 from mediawiki to misc (m3)

https://gerrit.wikimedia.org/r/377474

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1059.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709121603_jynus_3476.log.

Completed auto-reimage of hosts:

['db1059.eqiad.wmnet']

and were ALL successful.

Change 377687 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Repoint m3 secondary host (replica) to db1059

https://gerrit.wikimedia.org/r/377687

Change 377687 merged by Jcrespo:
[operations/puppet@production] mariadb: Repoint m3 secondary host (replica) to db1059

https://gerrit.wikimedia.org/r/377687

Change 377693 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] phabricator/mariadb: Update database configuration for stretch/10.1

https://gerrit.wikimedia.org/r/377693

jcrespo added a subscriber: mmodell.

@mmodell We have to upgrade the hardware for phabricator databases. What do you think of doing also this thursday a master switchover and upgrade to stretch/mariadb 10.1, enable TLS and setup the firewall. It should be a few seconds of restarting phabricator to get the new connections, if something goes bad, we revert to the current server.

Change 377701 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] misc dbs: Repoint m3-slave to the new replica server db1059

https://gerrit.wikimedia.org/r/377701

Change 377701 merged by Jcrespo:
[operations/dns@master] misc dbs: Repoint m3-slave to the new replica server db1059

https://gerrit.wikimedia.org/r/377701

jcrespo renamed this task from Move m3 slave to db1059 to Decommission db1048 (was Move m3 slave to db1059).Sep 13 2017, 4:26 AM
jcrespo removed a project: Patch-For-Review.
jcrespo updated the task description. (Show Details)
jcrespo added a subscriber: Cmjohnson.

Change 377705 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] decommission: Set db1048 with a spare role

https://gerrit.wikimedia.org/r/377705

Change 377705 merged by Jcrespo:
[operations/puppet@production] decommission: Set db1048 with a spare role

https://gerrit.wikimedia.org/r/377705

db1048 is now ready to be decommissioned, it is set as spare, but it still needs to be fully deleted from the configuration and infrastructure (installer, site.pp).

jcrespo lowered the priority of this task from Medium to Low.Sep 13 2017, 4:36 AM
jcrespo edited projects, added ops-eqiad; removed Patch-For-Review.

@jcrespo Any time will work for me, there is scheduled maintenance at midnight tonight (UTC) but if it's just a few seconds of downtime I think we can do it whenever.

Let's wait a bit more. I may have to talk to you abut setting up TLS for php and changing passwords, let's talk and aim for next week (but we shouldn't delay it much).

@jcrespo: ok whenever works for you I'll try to be available.

@mmodell This is still needed, but this and the next week are going to be problematic. As a heads up, we may need to merge some puppet changes simultneously on phabricator database and all its application servers. I will try to send you a calendar proposal at some point.

@jcrespo: Thanks, I'll keep an eye out for it.

@Cmjohnson sorry for the confusion- indeed it is ok to put down db1048. All other conversations were about failover to db1059 to substitute db1043 (this cannot be done yet). We will handle that on a separate ticket.

All non-interruptible steps have been completed. Still needs wiping/removal from rack

Still showing in servermon, also seems like a missing "puppet node deactivate"

Change 377693 merged by Jcrespo:
[operations/puppet@production] phabricator/mariadb: Update database configuration for stretch/10.1

https://gerrit.wikimedia.org/r/377693