Page MenuHomePhabricator

Reorganize our redis rdb1/rdb2 clusters
Closed, ResolvedPublic

Description

These clusters are *mostly* unused, and need to be reorganized.

EQUIAD (rdb1*):

Our end goal is to have two master/slave couples, rdb1009/1010 and rdb1005/1006, and decommission the remaining servers.

  • Check which clients exist, and where do they connect. We are aware of change-propagation connecting to rdb1001/1003 via nutcracker
  • Create a new puppet role for redis::misc::master, which should use profile::redis::master
  • Create a corresponding role for redis::misc::slave
  • Install rdb1009 as master, 1010 as slave; Define their puppet cluster variable as redis_main (also do the same for all servers below.
  • Reimage rdb1005 with stretch as master, rdb1006 as slave
  • Migrate existing clients to rdb1005/1009
  • decommission rdb1001-1004 and rdb1007-1008

CODFW (rdb2*)
Our goal here is to reconfigure rdb2003-2006 as two master/slave couples, and return 2001/2002 to the spares list

  • Check which clients exist, and where do they connect. We are aware of change-propagation connecting to rdb2001/2003 via nutcracker
  • Reimage rdb2005/2006 with the same configuration as 1005/1006
  • Migrate all clients to use those
  • Reimage rdb2003/2004 in a similar fashion
  • Redistribute clients across the two groups
  • Reimage rdb2001/2002 as spares (were not re-imaged since this will be done by dc ops)

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+3 -15
operations/puppetproduction+2 -2
operations/puppetproduction+0 -4
operations/puppetproduction+7 -7
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+8 -7
operations/puppetproduction+3 -9
operations/puppetproduction+6 -1
operations/puppetproduction+0 -2
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+20 -0
operations/puppetproduction+1 -1
operations/puppetproduction+12 -0
operations/puppetproduction+2 -2
operations/puppetproduction+60 -3
labs/privatemaster+1 -0
Show related patches Customize query in gerrit

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptOct 8 2018, 9:30 AM

Change 467734 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: Added new role::redis::misc for general purposes redis servers

https://gerrit.wikimedia.org/r/467734

Change 468310 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[labs/private@master] Added dummy pass for role redis::misc::master

https://gerrit.wikimedia.org/r/468310

Change 468310 merged by Effie Mouzeli:
[labs/private@master] Added dummy pass for role redis::misc::master

https://gerrit.wikimedia.org/r/468310

Change 467734 merged by Effie Mouzeli:
[operations/puppet@production] Added new role::redis::misc for general purposes redis servers

https://gerrit.wikimedia.org/r/467734

Change 468586 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] redis::misc Fixed typos

https://gerrit.wikimedia.org/r/468586

Change 468586 merged by Effie Mouzeli:
[operations/puppet@production] redis::misc Fixed typos

https://gerrit.wikimedia.org/r/468586

jijiki triaged this task as Medium priority.Oct 22 2018, 3:55 PM

Change 470615 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] deployment-prep: added hieradata for deployment-rd3 host

https://gerrit.wikimedia.org/r/470615

Change 470615 merged by Effie Mouzeli:
[operations/puppet@production] deployment-prep: added hieradata for deployment-rd3 host

https://gerrit.wikimedia.org/r/470615

Change 470623 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] deployment-prep: fixed suffix for deployment-rd3-cptest-master01

https://gerrit.wikimedia.org/r/470623

Change 470623 merged by Effie Mouzeli:
[operations/puppet@production] deployment-prep: fixed suffix for deployment-rd3-cptest-master01

https://gerrit.wikimedia.org/r/470623

Change 471745 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: added redis_misc metrics

https://gerrit.wikimedia.org/r/471745

Change 471745 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: added redis_misc metrics

https://gerrit.wikimedia.org/r/471745

Change 471757 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: fix for redis_misc metrics

https://gerrit.wikimedia.org/r/471757

Change 471757 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: fix for redis_misc metrics

https://gerrit.wikimedia.org/r/471757

Change 471929 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: fix redis misc role

https://gerrit.wikimedia.org/r/471929

Change 471929 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: fix redis misc role

https://gerrit.wikimedia.org/r/471929

Change 471959 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::eqiad::scb: switch rdb1001:6382 with rdb1009:6379

https://gerrit.wikimedia.org/r/471959

Change 471959 merged by Effie Mouzeli:
[operations/puppet@production] role::eqiad::scb: switch rdb1001:6382 with rdb1009:6379

https://gerrit.wikimedia.org/r/471959

Change 472240 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] install_server: Reimage rdb1005 to stretch

https://gerrit.wikimedia.org/r/472240

Change 472240 merged by Effie Mouzeli:
[operations/puppet@production] install_server: Reimage rdb1005 to stretch

https://gerrit.wikimedia.org/r/472240

Change 472251 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Change rdb1005 to spare:system

https://gerrit.wikimedia.org/r/472251

Change 472251 merged by Effie Mouzeli:
[operations/puppet@production] Change rdb1005 to spare:system

https://gerrit.wikimedia.org/r/472251

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

rdb1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201811072242_jiji_158394_rdb1005_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['rdb1005.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-11-07T23:21:21Z] <jiji> Disabled nagios checks on rdb1006 and rdb2005 due to rdb1005 reimaging - T206450

Mentioned in SAL (#wikimedia-operations) [2018-11-08T10:52:36Z] <jiji> Reimaging rdb1006 to stretch - T206450

Change 472412 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Change rdb1005 and rdb1006 to redis::misc master/slave

https://gerrit.wikimedia.org/r/472412

Change 472412 merged by Effie Mouzeli:
[operations/puppet@production] Change rdb1005 and rdb1006 to redis::misc master/slave

https://gerrit.wikimedia.org/r/472412

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

rdb1006.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201811081152_jiji_140192_rdb1006_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['rdb1006.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-11-08T12:41:23Z] <jiji> Shutdown and reimage rdb200[56] - T206450

Change 472449 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2005/rdb2006

https://gerrit.wikimedia.org/r/472449

Change 472454 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::eqiad::scb: switch rdb1003:6382 with rdb1005:6379

https://gerrit.wikimedia.org/r/472454

Change 472449 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2005/rdb2006

https://gerrit.wikimedia.org/r/472449

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2005.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811081821_jiji_1886.log.

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2006.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811081823_jiji_2571.log.

Completed auto-reimage of hosts:

['rdb2005.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['rdb2006.codfw.wmnet']

and were ALL successful.

Change 472454 merged by Effie Mouzeli:
[operations/puppet@production] role::eqiad::scb: switch rdb1003:6382 with rdb1005:6379

https://gerrit.wikimedia.org/r/472454

Change 472669 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::codfw::scb: switch rdb2003:6382 with rdb2005:6379

https://gerrit.wikimedia.org/r/472669

Change 472669 merged by Effie Mouzeli:
[operations/puppet@production] role::codfw::scb: switch rdb2003:6382 with rdb2005:6379

https://gerrit.wikimedia.org/r/472669

jijiki updated the task description. (Show Details)

Change 472714 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2003/rdb2004

https://gerrit.wikimedia.org/r/472714

Change 472714 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

https://gerrit.wikimedia.org/r/472714

Change 472729 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2003/rdb2004 to stretch

https://gerrit.wikimedia.org/r/472729

Change 472729 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2003/rdb2004 to stretch

https://gerrit.wikimedia.org/r/472729

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811092114_jiji_244778.log.

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['rdb2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201811092116_jiji_245283.log.

Completed auto-reimage of hosts:

['rdb2004.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['rdb2003.codfw.wmnet']

and were ALL successful.

Change 472964 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] role::codfw::scb: switch rdb2001:6382 with rdb2003:6379

https://gerrit.wikimedia.org/r/472964

Change 472964 merged by Effie Mouzeli:
[operations/puppet@production] role::codfw::scb: switch rdb2001:6382 with rdb2003:6379

https://gerrit.wikimedia.org/r/472964

Mentioned in SAL (#wikimedia-operations) [2018-11-12T12:15:16Z] <jiji> Restarting nutcracker on scb200[1-6] - T206450

Change 472970 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] install_server: reimage rdb2001, rdb2002 to stretch

https://gerrit.wikimedia.org/r/472970

Change 472970 merged by Effie Mouzeli:
[operations/puppet@production] install_server: reimage rdb2001, rdb2002 to stretch

https://gerrit.wikimedia.org/r/472970

jijiki updated the task description. (Show Details)