Page MenuHomePhabricator

A6 and D3 have 3 db masters each
Closed, ResolvedPublic

Description

Right now, on A6 and D3 we have three masters in each (https://fault-tolerance.toolforge.org/map?cluster=db-masters)

  • A6: s4, s8, and m3
  • D3: s6, s2, and s3

This is suboptimal.

For codfw, we have a couple racks with two masters:

  • B4: s2 and s3
  • D6: s4 and s7

Event Timeline

Marostegui renamed this task from ERR_TOO_MANY_MASTERS to A6 and D3 have 3 db masters each.Jul 30 2024, 11:52 AM
Marostegui triaged this task as Medium priority.

I am going to start with s6. I need to change the candidate master.

Marostegui moved this task from Triage to In progress on the DBA board.

Change #1058139 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1201: Make it s6 candidate

https://gerrit.wikimedia.org/r/1058139

Change #1058139 merged by Marostegui:

[operations/puppet@production] db1201: Make it s6 candidate

https://gerrit.wikimedia.org/r/1058139

Mentioned in SAL (#wikimedia-operations) [2024-07-30T12:08:06Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1201 T371361', diff saved to https://phabricator.wikimedia.org/P67064 and previous config saved to /var/cache/conftool/dbconfig/20240730-120805-root.json

Change #1058140 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1231: Remove it from candidate master

https://gerrit.wikimedia.org/r/1058140

Change #1058140 merged by Marostegui:

[operations/puppet@production] db1231: Remove it from candidate master

https://gerrit.wikimedia.org/r/1058140

Mentioned in SAL (#wikimedia-operations) [2024-07-30T12:15:01Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1231 T371361', diff saved to https://phabricator.wikimedia.org/P67066 and previous config saved to /var/cache/conftool/dbconfig/20240730-121500-root.json

Change #1058144 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1193: Make it candidate master for s8

https://gerrit.wikimedia.org/r/1058144

Mentioned in SAL (#wikimedia-operations) [2024-07-30T12:22:44Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1193 T371361', diff saved to https://phabricator.wikimedia.org/P67068 and previous config saved to /var/cache/conftool/dbconfig/20240730-122243-root.json

Change #1058144 merged by Marostegui:

[operations/puppet@production] db1193: Make it candidate master for s8

https://gerrit.wikimedia.org/r/1058144

Change #1058148 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1192: Remove from candidate master

https://gerrit.wikimedia.org/r/1058148

Change #1058148 merged by Marostegui:

[operations/puppet@production] db1192: Remove from candidate master

https://gerrit.wikimedia.org/r/1058148

I think A6 is fine now. s4 and m3 is fine. In any case, m3 will need to be switched for T356240

D3 is now down to 2 masters only

Change #1058291 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2209: Make it s3 candidate master

https://gerrit.wikimedia.org/r/1058291

Change #1058291 merged by Marostegui:

[operations/puppet@production] db2209: Make it s3 candidate master

https://gerrit.wikimedia.org/r/1058291

Mentioned in SAL (#wikimedia-operations) [2024-07-31T05:46:53Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2209 T371361', diff saved to https://phabricator.wikimedia.org/P67108 and previous config saved to /var/cache/conftool/dbconfig/20240731-054653-root.json

Mentioned in SAL (#wikimedia-operations) [2024-07-31T05:50:04Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Make db2127 vslow and remove it as candidate master T371361', diff saved to https://phabricator.wikimedia.org/P67109 and previous config saved to /var/cache/conftool/dbconfig/20240731-055004-marostegui.json

Change #1058292 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2127: No longer s3 master

https://gerrit.wikimedia.org/r/1058292

Change #1058292 merged by Marostegui:

[operations/puppet@production] db2127: No longer s3 master

https://gerrit.wikimedia.org/r/1058292

Change #1058568 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2220: Make it candidate master for s7

https://gerrit.wikimedia.org/r/1058568

Change #1058568 merged by Marostegui:

[operations/puppet@production] db2220: Make it candidate master for s7

https://gerrit.wikimedia.org/r/1058568

Mentioned in SAL (#wikimedia-operations) [2024-07-31T09:14:51Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2220 T371361', diff saved to https://phabricator.wikimedia.org/P67139 and previous config saved to /var/cache/conftool/dbconfig/20240731-091450-root.json

Mentioned in SAL (#wikimedia-operations) [2024-07-31T09:17:06Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Move db2121 to vslow T371361', diff saved to https://phabricator.wikimedia.org/P67140 and previous config saved to /var/cache/conftool/dbconfig/20240731-091706-root.json

Change #1058570 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2121: Remove from candidate master

https://gerrit.wikimedia.org/r/1058570

Change #1058570 merged by Marostegui:

[operations/puppet@production] db2121: Remove from candidate master

https://gerrit.wikimedia.org/r/1058570

@Ladsgroup I think we are in a better state now, I will work on s2 or s3 when I get back from holidays. I don't want to do so many switchovers before going off for a week :)

Change #1062379 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Change s3 candidate master

https://gerrit.wikimedia.org/r/1062379

Change #1062379 merged by Marostegui:

[operations/puppet@production] mariadb: Change s3 candidate master

https://gerrit.wikimedia.org/r/1062379

Marostegui updated the task description. (Show Details)

s3 switched.