Reorganize our redis rdb1/rdb2 clusters
Closed, ResolvedPublic

Description

These clusters are *mostly* unused, and need to be reorganized.

EQUIAD (rdb1*):

Our end goal is to have two master/slave couples, rdb1009/1010 and rdb1005/1006, and decommission the remaining servers.

  • Check which clients exist, and where do they connect. We are aware of change-propagation connecting to rdb1001/1003 via nutcracker
  • Create a new puppet role for redis::misc::master, which should use profile::redis::master
  • Create a corresponding role for redis::misc::slave
  • Install rdb1009 as master, 1010 as slave; Define their puppet cluster variable as redis_main (also do the same for all servers below.
  • Reimage rdb1005 with stretch as master, rdb1006 as slave
  • Migrate existing clients to rdb1005/1009
  • decommission rdb1001-1004 and rdb1007-1008

CODFW (rdb2*)
Our goal here is to reconfigure rdb2003-2006 as two master/slave couples, and return 2001/2002 to the spares list

  • Check which clients exist, and where do they connect. We are aware of change-propagation connecting to rdb2001/2003 via nutcracker
  • Reimage rdb2005/2006 with the same configuration as 1005/1006
  • Migrate all clients to use those
  • Reimage rdb2003/2004 in a similar fashion
  • Redistribute clients across the two groups
  • Reimage rdb2001/2002 as spares (were not re-imaged since this will be done by dc ops)
Joe created this task.Oct 8 2018, 9:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 8 2018, 9:30 AM
jijiki moved this task from Backlog to In Progress on the User-jijiki board.Oct 12 2018, 9:35 AM

Change 467734 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: Added new role::redis::misc for general purposes redis servers

https://gerrit.wikimedia.org/r/467734

Change 468310 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[labs/private@master] Added dummy pass for role redis::misc::master

https://gerrit.wikimedia.org/r/468310

Change 468310 merged by Effie Mouzeli:
[labs/private@master] Added dummy pass for role redis::misc::master

https://gerrit.wikimedia.org/r/468310

Change 467734 merged by Effie Mouzeli:
[operations/puppet@production] Added new role::redis::misc for general purposes redis servers

https://gerrit.wikimedia.org/r/467734

Change 468586 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] redis::misc Fixed typos

https://gerrit.wikimedia.org/r/468586

Change 468586 merged by Effie Mouzeli:
[operations/puppet@production] redis::misc Fixed typos

https://gerrit.wikimedia.org/r/468586

jijiki updated the task description. (Show Details)Oct 19 2018, 3:02 PM
jijiki triaged this task as Normal priority.Oct 22 2018, 3:55 PM

Change 470615 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] deployment-prep: added hieradata for deployment-rd3 host

https://gerrit.wikimedia.org/r/470615

Change 470615 merged by Effie Mouzeli:
[operations/puppet@production] deployment-prep: added hieradata for deployment-rd3 host

https://gerrit.wikimedia.org/r/470615

Change 470623 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] deployment-prep: fixed suffix for deployment-rd3-cptest-master01

https://gerrit.wikimedia.org/r/470623

Change 470623 merged by Effie Mouzeli:
[operations/puppet@production] deployment-prep: fixed suffix for deployment-rd3-cptest-master01

https://gerrit.wikimedia.org/r/470623

Change 471745 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: added redis_misc metrics

https://gerrit.wikimedia.org/r/471745

Change 471745 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: added redis_misc metrics

https://gerrit.wikimedia.org/r/471745

Change 471757 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: fix for redis_misc metrics

https://gerrit.wikimedia.org/r/471757

Change 471757 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: fix for redis_misc metrics

https://gerrit.wikimedia.org/r/471757

Change 471929 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: fix redis misc role

https://gerrit.wikimedia.org/r/471929

Change 471929 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: fix redis misc role

https://gerrit.wikimedia.org/r/471929

Change 471959 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::eqiad::scb: switch rdb1001:6382 with rdb1009:6379

https://gerrit.wikimedia.org/r/471959

Change 471959 merged by Effie Mouzeli:
[operations/puppet@production] role::eqiad::scb: switch rdb1001:6382 with rdb1009:6379

https://gerrit.wikimedia.org/r/471959

Change 472240 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] install_server: Reimage rdb1005 to stretch

https://gerrit.wikimedia.org/r/472240

Change 472240 merged by Effie Mouzeli:
[operations/puppet@production] install_server: Reimage rdb1005 to stretch

https://gerrit.wikimedia.org/r/472240

Change 472251 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Change rdb1005 to spare:system

https://gerrit.wikimedia.org/r/472251

Change 472251 merged by Effie Mouzeli:
[operations/puppet@production] Change rdb1005 to spare:system

https://gerrit.wikimedia.org/r/472251

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

rdb1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201811072242_jiji_158394_rdb1005_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['rdb1005.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-11-07T23:21:21Z] <jiji> Disabled nagios checks on rdb1006 and rdb2005 due to rdb1005 reimaging - T206450

Mentioned in SAL (#wikimedia-operations) [2018-11-08T10:52:36Z] <jiji> Reimaging rdb1006 to stretch - T206450

Change 472412 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Change rdb1005 and rdb1006 to redis::misc master/slave

https://gerrit.wikimedia.org/r/472412

Change 472412 merged by Effie Mouzeli:
[operations/puppet@production] Change rdb1005 and rdb1006 to redis::misc master/slave

https://gerrit.wikimedia.org/r/472412

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

rdb1006.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201811081152_jiji_140192_rdb1006_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['rdb1006.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-11-08T12:41:23Z] <jiji> Shutdown and reimage rdb200[56] - T206450

Mentioned in SAL (#wikimedia-operations) [2018-11-08T13:38:40Z] <jiji> Done reimaging rdb1006 - T206450

Change 472449 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2005/rdb2006

https://gerrit.wikimedia.org/r/472449

Change 472454 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::eqiad::scb: switch rdb1003:6382 with rdb1005:6379

https://gerrit.wikimedia.org/r/472454

Change 472449 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2005/rdb2006

https://gerrit.wikimedia.org/r/472449

jijiki updated the task description. (Show Details)Nov 8 2018, 6:13 PM

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2005.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811081821_jiji_1886.log.

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2006.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811081823_jiji_2571.log.

Completed auto-reimage of hosts:

['rdb2005.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['rdb2006.codfw.wmnet']

and were ALL successful.

Change 472454 merged by Effie Mouzeli:
[operations/puppet@production] role::eqiad::scb: switch rdb1003:6382 with rdb1005:6379

https://gerrit.wikimedia.org/r/472454

Change 472669 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::codfw::scb: switch rdb2003:6382 with rdb2005:6379

https://gerrit.wikimedia.org/r/472669

Change 472669 merged by Effie Mouzeli:
[operations/puppet@production] role::codfw::scb: switch rdb2003:6382 with rdb2005:6379

https://gerrit.wikimedia.org/r/472669

jijiki updated the task description. (Show Details)Nov 9 2018, 7:07 PM
jijiki updated the task description. (Show Details)

Change 472714 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2003/rdb2004

https://gerrit.wikimedia.org/r/472714

Change 472714 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

https://gerrit.wikimedia.org/r/472714

Change 472729 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2003/rdb2004 to stretch

https://gerrit.wikimedia.org/r/472729

Change 472729 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2003/rdb2004 to stretch

https://gerrit.wikimedia.org/r/472729

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811092114_jiji_244778.log.

Mentioned in SAL (#wikimedia-operations) [2018-11-09T21:16:24Z] <jiji> Reimaging rdb2003, rdb2004 - T206450

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811092116_jiji_245283.log.

Completed auto-reimage of hosts:

['rdb2004.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['rdb2003.codfw.wmnet']

and were ALL successful.

Change 472964 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::codfw::scb: switch rdb2001:6382 with rdb2003:6379

https://gerrit.wikimedia.org/r/472964

jijiki updated the task description. (Show Details)Nov 12 2018, 11:39 AM

Change 472964 merged by Effie Mouzeli:
[operations/puppet@production] role::codfw::scb: switch rdb2001:6382 with rdb2003:6379

https://gerrit.wikimedia.org/r/472964

Mentioned in SAL (#wikimedia-operations) [2018-11-12T12:15:16Z] <jiji> Restarting nutcracker on scb200[1-6] - T206450

Change 472970 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] install_server: reimage rdb2001, rdb2002 to stretch

https://gerrit.wikimedia.org/r/472970

Change 472970 merged by Effie Mouzeli:
[operations/puppet@production] install_server: reimage rdb2001, rdb2002 to stretch

https://gerrit.wikimedia.org/r/472970

jijiki moved this task from In Progress to Done on the User-jijiki board.Mon, Nov 19, 9:10 AM
jijiki updated the task description. (Show Details)Mon, Nov 19, 6:10 PM
jijiki closed this task as Resolved.