The conversion to multi-instance is now complete in the eqiad datacenter, and is on track for completion in codfw RSN. Our current baseline is an instance count of 2 per host, with the exception of restbase200[1-2].codfw.wmnet, which are already running 3 instances each.
Back-of-napkin: If each instance in eqiad is currently ~1T in size, bumping instance count to 3 should reduce node density to ~682G (based on present storage levels). My expectation is that this will improve read latency by reducing the SSTables/read, put us in a more favorable position to begin incremental repairs, and give the aggressive memory configurations that have been proposed in T125906, a better chance of succeeding.
Based on the outcome of T130540, we can move forward in eqiad without the need to serialize with the on-going expansions in codfw.
See:
- T130540: Figure out if nodes in different DCs can be bootstrapped in parallel
- T95253: Finish conversion to multiple Cassandra instances per hardware node
Instances to bootstrap
- 1007-c
- 1008-c
- 1009-c
- 1010-c
- 1011-c
- 1012-c
- 1013-c
- 1014-c
- 1015-c
- 2003-b
- 2003-c
- 2004-b
- 2004-c
- 2005-b
- 2005-c
- 2006-b
- 2006-c
- 2007-c
- 2008-c
- 2009-c
Some cleanup activity has occurred as the expansion has progressed, but one final sweep will be needed on each rack, once all range movements have completed.
Instances to cleanup
- Eqiad
- Rack A
- 1007-a
- 1007-b
- 1007-c
- 1010-a
- 1010-b
- 1010-c
- 1011-a
- 1011-b
- 1011-c
- Rack B
- 1008-a
- 1008-b
- 1008-c
- 1012-a
- 1012-b
- 1012-c
- 1013-a
- 1013-b
- 1013-c
- Rack D
- 1009-a
- 1009-b
- 1009-c
- 1014-a
- 1014-b
- 1014-c
- 1015-a
- 1015-b
- 1015-c
- Rack A
- Codfw
- Rack B
- 2001-a
- 2001-b
- 2001-c
- 2002-a
- 2002-b
- 2002-c
- 2007-a
- 2007-b
- 2007-c
- Rack C
- 2003-a
- 2003-b
- 2003-c
- 2004-a
- 2004-b
- 2004-c
- 2008-a
- 2008-b
- 2008-c
- Rack D
- 2005-a
- 2005-b
- 2005-c
- 2006-a
- 2006-b
- 2006-c
- 2009-a
- 2009-b
- 2009-c
- Rack B