Page MenuHomePhabricator

Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers)
Open, MediumPublic

Description

The following hosts are scheduled to be decommissioned in Q2 and need to be refreshed:
db1074-db1095 (22 servers)

We have started to see issues on them, especially BBU related: T258360 T258360

Replacement plan:

  • db1124 B1 (old sanitarium host, currently in use) to replace db1077
  • db1125 D1 (old sanitarium host, currently in use) to be placed on s7
  • db1156 A1 to replace db1074 (sanitarium master)
  • db1157 A5 to replace db1075
  • db1158 A5 to replace db1079 (sanitarium master)
  • db1159 A6 to replace db1080 (m1 master)
  • db1160 A6 to replace db1081
  • db1161 A8 to replace db1082 (sanitarium master)
  • db1162 B1 to replace db1076 (candidate master)
  • db1163 B3 to replace db1083 (s1 master) CURRENTLY pooled on s1 as stretch to substitute db1134 T274472
  • db1164 B5 to replace db1084
  • db1165 B6 to replace db1085 (sanitarium master)
  • db1166 C3 to replace db1078
  • db1167 C3 to replace db1087 (sanitarium master)
  • db1168 C5 to replace db1088
  • db1169 C5 to replace db1089
  • db1170 C6 to replace db1090 (multi-instance)
  • db1171 C6 to replace db1095
  • db1172 D1 to replace db1092
  • db1173 D3 to replace db1093 (candidate master)
  • db1174 D6 to replace db1094
  • db1175 D3 to be placed on s3

Decommissioning progress

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+3 -2
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+3 -2
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+2 -2
operations/puppetproduction+1 -0
operations/puppetproduction+5 -2
operations/puppetproduction+1 -1
operations/puppetproduction+5 -2
operations/puppetproduction+2 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+8 -4
operations/puppetproduction+2 -2
operations/puppetproduction+5 -2
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+5 -2
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+3 -2
operations/puppetproduction+3 -2
operations/puppetproduction+9 -2
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+4 -2
operations/puppetproduction+4 -2
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+3 -2
operations/puppetproduction+1 -1
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+5 -2
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenMarostegui
DeclinedNone
ResolvedMarostegui
ResolvedJclark-ctr
ResolvedMarostegui
ResolvedMarostegui
ResolvedRequestwiki_willy
ResolvedMarostegui
ResolvedTrizek-WMF
OpenKormat
OpenNone
ResolvedCmjohnson
ResolvedRobH
ResolvedMarostegui
StalledMarostegui
ResolvedCmjohnson
Resolveddcaro
OpenRequestCmjohnson
ResolvedRequestCmjohnson
OpenRequestCmjohnson
OpenRequestCmjohnson
OpenRequestCmjohnson
OpenRequestCmjohnson
OpenRequestCmjohnson
StalledRequestMarostegui
OpenCmjohnson
OpenRequestMarostegui

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2021-02-10T06:35:35Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json

Change 663107 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Productionize db1162

https://gerrit.wikimedia.org/r/663107

Change 663107 merged by Marostegui:
[operations/puppet@production] mariadb: Productionize db1162

https://gerrit.wikimedia.org/r/663107

Mentioned in SAL (#wikimedia-operations) [2021-02-10T06:43:30Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json

Fully pooled:
db1170:3312
db1170:3317

Marostegui updated the task description. (Show Details)Wed, Feb 10, 6:44 AM
Marostegui updated the task description. (Show Details)

db1162 is now replicating, but I won't pool it until I'm back next week.

Marostegui updated the task description. (Show Details)Wed, Feb 10, 10:16 AM

Change 663181 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Do not reimage db1157

https://gerrit.wikimedia.org/r/663181

Change 663181 merged by Marostegui:
[operations/puppet@production] install_server: Do not reimage db1157

https://gerrit.wikimedia.org/r/663181

jcrespo claimed this task.Thu, Feb 11, 10:25 AM

I am taking db1163 to, at least temporarily, substitute db1134 due to T274472.

Change 663549 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] db1163: Reimage to stretch to potentially become s1 candidate master

https://gerrit.wikimedia.org/r/663549

Change 663549 merged by Jcrespo:
[operations/puppet@production] db1163: Reimage to stretch to potentially become s1 candidate master

https://gerrit.wikimedia.org/r/663549

Change 663570 had a related patch set uploaded (by LSobanski; owner: LSobanski):
[operations/puppet@production] instances.yaml: Add db1163 to dbctl

https://gerrit.wikimedia.org/r/663570

Change 663570 merged by LSobanski:
[operations/puppet@production] instances.yaml: Add db1163 to dbctl

https://gerrit.wikimedia.org/r/663570

Mentioned in SAL (#wikimedia-operations) [2021-02-11T14:44:46Z] <kormat@cumin1001> dbctl commit (dc=all): 'Add db1163 to s1 T258361', diff saved to https://phabricator.wikimedia.org/P14318 and previous config saved to /var/cache/conftool/dbconfig/20210211-144445-kormat.json

Mentioned in SAL (#wikimedia-operations) [2021-02-11T15:45:01Z] <kormat@cumin1001> dbctl commit (dc=all): 'Pool db1163 at 1% T258361', diff saved to https://phabricator.wikimedia.org/P14320 and previous config saved to /var/cache/conftool/dbconfig/20210211-154501-kormat.json

Mentioned in SAL (#wikimedia-operations) [2021-02-11T16:13:08Z] <kormat@cumin1001> dbctl commit (dc=all): 'Pool db1163 at 1%, again T258361', diff saved to https://phabricator.wikimedia.org/P14323 and previous config saved to /var/cache/conftool/dbconfig/20210211-161308-kormat.json

jcrespo reassigned this task from jcrespo to Marostegui.Thu, Feb 11, 4:22 PM
jcrespo updated the task description. (Show Details)

Change 663608 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Reenable notifications for db1163 once it has been pooled

https://gerrit.wikimedia.org/r/663608

Change 663608 merged by Jcrespo:
[operations/puppet@production] mariadb: Reenable notifications for db1163 once it has been pooled

https://gerrit.wikimedia.org/r/663608

I am taking db1163 to, at least temporarily, substitute db1134 due to T274472.

Thanks. I am going to leave db1163 in s1 as it needed to replace db1083 (s1) as well anyways. Going to do a compare data between db1163 and db1083 and if it is fine, I will just use db1163 as future master and re-image db1134 as buster.
Once db1083 is no longer master (to be scheduled for this Q), I will reimage db1118 to Stretch and make it candidate master (for real)

Change 664087 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1162: Enable notifications

https://gerrit.wikimedia.org/r/664087

Change 664087 merged by Marostegui:
[operations/puppet@production] db1162: Enable notifications

https://gerrit.wikimedia.org/r/664087

Change 664088 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] instances.yaml: Add db1162 to dbctl

https://gerrit.wikimedia.org/r/664088

Change 664088 merged by Marostegui:
[operations/puppet@production] instances.yaml: Add db1162 to dbctl

https://gerrit.wikimedia.org/r/664088

Mentioned in SAL (#wikimedia-operations) [2021-02-15T06:40:02Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1162 to dbctl - depooled T258361', diff saved to https://phabricator.wikimedia.org/P14339 and previous config saved to /var/cache/conftool/dbconfig/20210215-064001-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-02-15T06:46:28Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14340 and previous config saved to /var/cache/conftool/dbconfig/20210215-064628-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-02-15T07:02:06Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14341 and previous config saved to /var/cache/conftool/dbconfig/20210215-070206-marostegui.json

Marostegui updated the task description. (Show Details)Mon, Feb 15, 8:26 AM

db1162 is fully pooled

I am taking db1163 to, at least temporarily, substitute db1134 due to T274472.

Thanks. I am going to leave db1163 in s1 as it needed to replace db1083 (s1) as well anyways. Going to do a compare data between db1163 and db1083 and if it is fine, I will just use db1163 as future master and re-image db1134 as buster.
Once db1083 is no longer master (to be scheduled for this Q), I will reimage db1118 to Stretch and make it candidate master (for real)

db1163 was compared against current master (db1083) and it is ok. So db1163 will be the new s1 master, db1134 will be reimaged to buster and will be a slave, and db1118 will be reimaged to stretch and will be the candidate master.

Marostegui updated the task description. (Show Details)Tue, Feb 16, 6:45 AM

Mentioned in SAL (#wikimedia-operations) [2021-02-16T06:46:03Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1092 to clone db1172 T258361', diff saved to https://phabricator.wikimedia.org/P14365 and previous config saved to /var/cache/conftool/dbconfig/20210216-064602-marostegui.json

Change 664483 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Productionize db1172

https://gerrit.wikimedia.org/r/664483

Change 664483 merged by Marostegui:
[operations/puppet@production] mariadb: Productionize db1172

https://gerrit.wikimedia.org/r/664483

db1172 is now replicating on s8, will start pooling tomorrow

Change 664710 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1172: Disable notifications

https://gerrit.wikimedia.org/r/664710

Change 664710 merged by Marostegui:
[operations/puppet@production] db1172: Disable notifications

https://gerrit.wikimedia.org/r/664710

Change 664723 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] instances.yaml: Add db1172 to dbctl

https://gerrit.wikimedia.org/r/664723

Change 664723 merged by Marostegui:
[operations/puppet@production] instances.yaml: Add db1172 to dbctl

https://gerrit.wikimedia.org/r/664723

Mentioned in SAL (#wikimedia-operations) [2021-02-17T06:39:16Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1172 to dbctl, but not pooled yet T258361', diff saved to https://phabricator.wikimedia.org/P14385 and previous config saved to /var/cache/conftool/dbconfig/20210217-063915-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-02-17T07:21:32Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1172 in s8 for the first time - T258361', diff saved to https://phabricator.wikimedia.org/P14386 and previous config saved to /var/cache/conftool/dbconfig/20210217-072131-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-02-17T07:41:08Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly pool db1172 in s8 - T258361', diff saved to https://phabricator.wikimedia.org/P14387 and previous config saved to /var/cache/conftool/dbconfig/20210217-074107-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-02-17T08:41:21Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly pool db1172 in s8 - T258361', diff saved to https://phabricator.wikimedia.org/P14388 and previous config saved to /var/cache/conftool/dbconfig/20210217-084120-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-02-17T11:24:22Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly pool db1172 in s8 - T258361', diff saved to https://phabricator.wikimedia.org/P14389 and previous config saved to /var/cache/conftool/dbconfig/20210217-112422-marostegui.json

Marostegui updated the task description. (Show Details)Wed, Feb 17, 12:40 PM

db1172 is now being automatically pooled into s8

Marostegui updated the task description. (Show Details)Wed, Feb 17, 12:43 PM
Marostegui updated the task description. (Show Details)Thu, Feb 18, 6:30 AM
RhinosF1 updated the task description. (Show Details)Sun, Feb 21, 3:55 PM

db1162 is fully pooled

This just went down.

Marostegui changed the status of subtask T274752: decommission db1076.eqiad.wmnet from Open to Stalled.Sun, Feb 21, 4:15 PM
Marostegui updated the task description. (Show Details)Wed, Feb 24, 6:31 AM

Change 666786 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Productionize db1168

https://gerrit.wikimedia.org/r/666786

Mentioned in SAL (#wikimedia-operations) [2021-02-25T06:50:19Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1088 to clone db1168 T258361', diff saved to https://phabricator.wikimedia.org/P14474 and previous config saved to /var/cache/conftool/dbconfig/20210225-065018-marostegui.json

Transfer on-going from db1088 to db1168. I will also install 10.4.18 on db1168

Change 666786 merged by Marostegui:
[operations/puppet@production] mariadb: Productionize db1168

https://gerrit.wikimedia.org/r/666786

db1168 is now replicating.

Marostegui updated the task description. (Show Details)Mon, Mar 1, 6:35 AM

Change 667428 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1168: Enable notifications

https://gerrit.wikimedia.org/r/667428

Change 667428 merged by Marostegui:
[operations/puppet@production] db1168: Enable notifications

https://gerrit.wikimedia.org/r/667428

Change 667429 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] instances.yaml: Add db1168 to dbctl

https://gerrit.wikimedia.org/r/667429

Change 667429 merged by Marostegui:
[operations/puppet@production] instances.yaml: Add db1168 to dbctl

https://gerrit.wikimedia.org/r/667429

Mentioned in SAL (#wikimedia-operations) [2021-03-01T06:46:04Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1168 to dbctl T258361!', diff saved to https://phabricator.wikimedia.org/P14519 and previous config saved to /var/cache/conftool/dbconfig/20210301-064603-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-03-01T06:47:05Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1168 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14520 and previous config saved to /var/cache/conftool/dbconfig/20210301-064704-marostegui.json

db1168 is now slowly being pooled into s6 running 10.4.18

Marostegui updated the task description. (Show Details)Mon, Mar 1, 7:48 AM