Page MenuHomePhabricator

Productionize 8 eqiad hosts
Closed, ResolvedPublic

Description

Productionize 8 eqiad hosts:

  • db1116 Converted to be temporary sanitarium multi-instance until the new hardware arrives
  • db1117 misc multi-instance
  • db1118 Temporarily used as s1-test to evaluate MySQL 8.0
  • db1119 Placed it in s1 to replace db1066 which will be moved to be s2 master to replace db1054
  • db1120 Converted to be temporary sanitarium multi-instance until the new hardware arrives
  • db1121 added into s4 to replace db1064, then move that one to x1
  • db1122 added into s2
  • db1123 added into s3 to replace db1072, which will be moved to m3

Before closing:

  • Disable BBU auto-learn on all new hosts

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+0 -1
operations/puppetproduction+2 -2
operations/puppetproduction+0 -1
operations/softwaremaster+6 -4
operations/puppetproduction+9 -8
operations/puppetproduction+4 -4
operations/dnsmaster+1 -1
operations/puppetproduction+14 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -1
operations/puppetproduction+385 -6
operations/puppetproduction+2 -1
operations/puppetproduction+17 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+5 -2
operations/mediawiki-configmaster+8 -12
operations/softwaremaster+2 -1
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+4 -4
operations/puppetproduction+5 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -2
operations/mediawiki-configmaster+3 -1
operations/mediawiki-configmaster+2 -0
operations/mediawiki-configmaster+2 -2
operations/softwaremaster+1 -0
operations/puppetproduction+4 -1
operations/softwaremaster+2 -1
operations/mediawiki-configmaster+14 -13
operations/mediawiki-configmaster+2 -0
operations/puppetproduction+7 -5
operations/mediawiki-configmaster+6 -6
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+6 -10
operations/mediawiki-configmaster+6 -6
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+4 -4
operations/mediawiki-configmaster+13 -6
operations/puppetproduction+2 -0
operations/mediawiki-configmaster+5 -5
operations/puppetproduction+1 -0
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+9 -4
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+2 -1
operations/mediawiki-configmaster+2 -0
operations/puppetproduction+1 -1
operations/puppetproduction+3 -1
operations/puppetproduction+2 -2
operations/puppetproduction+31 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 430848 merged by jenkins-bot:
[operations/software@master] s1.hosts: Add db1119 to s1

https://gerrit.wikimedia.org/r/430848

Change 430850 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1066

https://gerrit.wikimedia.org/r/430850

Change 430850 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1066

https://gerrit.wikimedia.org/r/430850

Mentioned in SAL (#wikimedia-operations) [2018-05-04T06:37:50Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1066 - T192979 (duration: 01m 11s)

Mentioned in SAL (#wikimedia-operations) [2018-05-04T06:42:16Z] <marostegui> Stop MySQL on db1066 to clone db1119 - T192979

Change 430852 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1119 to the config

https://gerrit.wikimedia.org/r/430852

Change 430852 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1119 to the config

https://gerrit.wikimedia.org/r/430852

Change 430856 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Slowly pool db1119 in s1

https://gerrit.wikimedia.org/r/430856

Change 430856 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Slowly pool db1119 in s1

https://gerrit.wikimedia.org/r/430856

Change 430919 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Reenable notifications on several core hosts

https://gerrit.wikimedia.org/r/430919

Change 430919 merged by Jcrespo:
[operations/puppet@production] mariadb: Reenable notifications on several core hosts

https://gerrit.wikimedia.org/r/430919

This is a proposal for the remaining decommission/setup:

m1: master: db1063 (C5) replica: db1117:3331 (A8)
m2: master: db1065 (D1) replica: db1117:3332 (A8)
m3: master: db1072 (B2) replica: db1117:3333 (A8)
m5: master: db1073 (B3) replica: db1117:3335 (A8)
s3: vslow: db1123 (D8)

Change 432054 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1077 to ROW binlog_format

https://gerrit.wikimedia.org/r/432054

Change 432054 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1077 to ROW binlog_format

https://gerrit.wikimedia.org/r/432054

Change 432056 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1072 for maintenance

https://gerrit.wikimedia.org/r/432056

Change 432056 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1072 for maintenance

https://gerrit.wikimedia.org/r/432056

This is a proposal for the remaining decommission/setup:

m1: master: db1063 (C5) replica: db1117:3331 (A8)
m2: master: db1065 (D1) replica: db1117:3332 (A8)
m3: master: db1072 (B2) replica: db1117:3333 (A8)
m5: master: db1073 (B3) replica: db1117:3335 (A8)
s3: vslow: db1123 (D8)

This looks good to me! Thanks for taking the time and putting up this proposal!

What is the state of db1116 ? I would like to move its master so I can clone db1072 away.

What is the state of db1116 ? I would like to move its master so I can clone db1072 away.

db1116 is not in use. It is just replicating from s3 (db1072)
db2092 is not in use. It is just replicating from s3 (db1072)

Blocked on depooling db1072 as snapshots are running on it.

jcrespo updated the task description. (Show Details)

Change 432068 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Setup db1123 into s3

https://gerrit.wikimedia.org/r/432068

Change 432068 merged by Jcrespo:
[operations/puppet@production] mariadb: Setup db1123 into s3

https://gerrit.wikimedia.org/r/432068

Change 432103 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb: Move db1072 to m3, db1123 to s4

https://gerrit.wikimedia.org/r/432103

Migrated db1072's replicas to db1077, with the following coordinates:

db1072: db1072-bin.000860:589708946 / db1075-bin.002777:647785363
db1077: db1077-bin.002831:968832211

Change 432357 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Add db1123 to mediawiki configuration, depooled

https://gerrit.wikimedia.org/r/432357

Change 432359 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Reenable notifications for db1123

https://gerrit.wikimedia.org/r/432359

Change 432357 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Add db1123 to mediawiki configuration, depooled

https://gerrit.wikimedia.org/r/432357

Change 432372 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1123 with low weight

https://gerrit.wikimedia.org/r/432372

Change 432359 merged by Jcrespo:
[operations/puppet@production] mariadb: Reenable notifications for db1123

https://gerrit.wikimedia.org/r/432359

Change 432372 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1123 with low weight

https://gerrit.wikimedia.org/r/432372

Change 432103 merged by jenkins-bot:
[operations/software@master] mariadb: Move db1072 to m3, db1123 to s4

https://gerrit.wikimedia.org/r/432103

Change 432409 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Fully pool db1123, remove db1072

https://gerrit.wikimedia.org/r/432409

Change 432409 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Fully pool db1123, remove db1072

https://gerrit.wikimedia.org/r/432409

This is a proposal for the remaining decommission/setup:

m1: master: db1063 (C5) replica: db1117:3331 (A8)
m2: master: db1065 (D1) replica: db1117:3332 (A8)
m3: master: db1072 (B2) replica: db1117:3333 (A8)
m5: master: db1073 (B3) replica: db1117:3335 (A8)
s3: vslow: db1123 (D8)

Can you confirm db1120 will be free or you have plans for it? If so, I will use it to build a logical copy of db1102 so we can have both temporary sanitarium hosts in eqiad built and once the new HW arrives we can clone them right away and without downtime.

You can use it with no problem, at least for the immediate future.

Yeah, it will be just temporary till the new hosts arrive.
Once they arrive we can stop db1116 and db1120 and copy the content to them. Do the failover from db1095 and db1102 to those two new and we free up those 4 hosts.

Change 432959 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Convert db1120 to a temporary sanitarium

https://gerrit.wikimedia.org/r/432959

Change 432959 merged by Marostegui:
[operations/puppet@production] mariadb: Convert db1120 to a temporary sanitarium

https://gerrit.wikimedia.org/r/432959

Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

db1120.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201805140821_marostegui_18848_db1120_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['db1120.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-05-17T05:52:54Z] <marostegui> Disable BBU auto-learn on new hosts - T192979

This has been disabled everywhere:

db1116
  Auto-Learn Mode: Disabled

db1117
  Auto-Learn Mode: Disabled

db1118
  Auto-Learn Mode: Disabled

db1119
  Auto-Learn Mode: Disabled

db1120
  Auto-Learn Mode: Disabled

db1121
  Auto-Learn Mode: Disabled

db1122
  Auto-Learn Mode: Disabled

db1123
  Auto-Learn Mode: Disabled

Change 434638 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Allow to setup a custom template for individual instances

https://gerrit.wikimedia.org/r/434638

Change 434639 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Setup db1107 as misc multiinstance

https://gerrit.wikimedia.org/r/434639

Change 434638 merged by Jcrespo:
[operations/puppet@production] mariadb: Allow to setup a custom template for individual instances

https://gerrit.wikimedia.org/r/434638

Change 434639 merged by Jcrespo:
[operations/puppet@production] mariadb: Setup db1117 as misc multiinstance

https://gerrit.wikimedia.org/r/434639

Change 434651 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Fix reference to misc_multiinstance template

https://gerrit.wikimedia.org/r/434651

Change 434651 merged by Jcrespo:
[operations/puppet@production] mariadb: Fix reference to misc_multiinstance template

https://gerrit.wikimedia.org/r/434651

Change 434655 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Fix duplicate definition of monitoring at misc_multiinstance role

https://gerrit.wikimedia.org/r/434655

Change 434655 merged by Jcrespo:
[operations/puppet@production] mariadb: Fix duplicate definition of monitoring at misc_multiinstance role

https://gerrit.wikimedia.org/r/434655

Change 434672 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] misc-mariadb-monitoring: Add db1117 replica instances to monitoring

https://gerrit.wikimedia.org/r/434672

Change 434675 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb-hosts: Add db1117 instances to m1,m2,m3 and m5

https://gerrit.wikimedia.org/r/434675

Change 434672 merged by Jcrespo:
[operations/puppet@production] misc-mariadb-monitoring: Add db1117 replica instances to monitoring

https://gerrit.wikimedia.org/r/434672

Change 434680 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switch m2-replica to db1117:3321

https://gerrit.wikimedia.org/r/434680

Change 434696 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: remove db1065 as an m1 replica, set db1117 instead

https://gerrit.wikimedia.org/r/434696

Change 434698 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1065 from m1 to m2

https://gerrit.wikimedia.org/r/434698

Change 434696 merged by Jcrespo:
[operations/dns@master] mariadb: remove db1065 as an m1 replica, set db1117 instead

https://gerrit.wikimedia.org/r/434696

Change 434680 merged by Jcrespo:
[operations/puppet@production] mariadb: Switch m1-replica to db1117:3321

https://gerrit.wikimedia.org/r/434680

Change 434698 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1065 from m1 to m2

https://gerrit.wikimedia.org/r/434698

Change 434740 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Set up db1117:3325 as the backup host for m5 database section

https://gerrit.wikimedia.org/r/434740

db1117:m5 and db1065 are now loading m2 from a logical dump. The only thing missing aside from that is the copy of m3 and m5 to db2078, and the required switchovers of service.

Marostegui assigned this task to jcrespo.
Marostegui updated the task description. (Show Details)

Let's close this task. There are 4 hosts that will be freed up once we have moved to the definitive HW for the new sanitarium hosts (T194780)
Let's track those 4 in a different task once they are completely free

Change 434675 merged by Jcrespo:
[operations/software@master] mariadb-hosts: Add db1117 instances to m1,m2,m3 and m5

https://gerrit.wikimedia.org/r/434675

Change 434885 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Reenable notifications on db2078

https://gerrit.wikimedia.org/r/434885

Change 434885 merged by Jcrespo:
[operations/puppet@production] mariadb: Reenable notifications on db2078

https://gerrit.wikimedia.org/r/434885

Change 434740 merged by Jcrespo:
[operations/puppet@production] mariadb: Set up db1117:3325 as the backup host for m5 database section

https://gerrit.wikimedia.org/r/434740

Change 434929 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Reenable notifications on db1117

https://gerrit.wikimedia.org/r/434929

Change 434929 merged by Jcrespo:
[operations/puppet@production] mariadb: Reenable notifications on db1117

https://gerrit.wikimedia.org/r/434929

Vvjjkkii renamed this task from Productionize 8 eqiad hosts to saeaaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed jcrespo as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Marostegui renamed this task from saeaaaaaaa to Productionize 8 eqiad hosts.Jul 1 2018, 5:14 AM
Marostegui closed this task as Resolved.
Marostegui assigned this task to jcrespo.
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)
Marostegui added subscribers: gerritbot, Aklapper.