Page MenuHomePhabricator

RESTBase cluster expansion
Closed, ResolvedPublic

Description

6 additional hosts have been procured (3 in codfw and 3 in eqiad) to expand the capacity of the RESTBase cluster.

This task will track the Services team work needed to complete the expansion.

Cassandra boostraps

eqiad

  • 1016-a
  • 1016-b
  • 1016-c
  • 1017-a
  • 1017-b
  • 1017-c
  • 1018-a
  • 1018-b
  • 1018-c

codfw

  • 2010-a
  • 2010-b
  • 2010-c
  • 2011-a
  • 2011-b
  • 2011-c
  • 2012-a
  • 2012-b
  • 2012-c

Cassandra cleanups (post-bootstraps)

eqiad

  • 1007-a
  • 1007-b
  • 1007-c
  • 1010-a
  • 1010-b
  • 1010-c
  • 1011-a
  • 1011-b
  • 1011-c
  • 1016-a
  • 1016-b
  • 1008-a
  • 1008-b
  • 1008-c
  • 1012-a
  • 1012-b
  • 1012-c
  • 1013-a
  • 1013-b
  • 1013-c
  • 1009-a
  • 1009-b
  • 1009-c
  • 1014-a
  • 1014-b
  • 1014-c
  • 1015-a
  • 1015-b
  • 1015-c
  • 1017-a
  • 1017-b
  • 1018-a
  • 1018-b

codfw

  • 2001-a
  • 2001-b
  • 2001-c
  • 2002-a
  • 2002-b
  • 2002-c
  • 2007-a
  • 2007-b
  • 2007-c
  • 2010-a
  • 2010-b
  • 2003-a
  • 2003-b
  • 2003-c
  • 2004-a
  • 2004-b
  • 2004-c
  • 2011-a
  • 2011-b
  • 2008-a
  • 2008-b
  • 2008-c
  • 2005-a
  • 2005-b
  • 2005-c
  • 2006-a
  • 2006-b
  • 2006-c
  • 2009-a
  • 2009-b
  • 2009-c
  • 2012-a
  • 2012-b

Details

Related Gerrit Patches:
operations/puppet : productionConftool: Add restbase101[678] and restbase201[012]
operations/puppet : productionenable instance restbase1018-c.eqiad.wmnet
operations/puppet : productionenable instance restbase1018-b.eqiad.wmnet
operations/puppet : productionenable instance restbase1018-a.eqiad.wmnet
operations/puppet : productionenable instance restbase1017-c.eqiad.wmnet
operations/puppet : productionenable instance restbase1017-b.eqiad.wmnet
operations/puppet : productionenable instance restbase1017-a.codfw.wmnet
operations/puppet : productionenable instance restbase1016-c.codfw.wmnet
operations/puppet : productionenable instance restbase1016-b.codfw.wmnet
operations/puppet : productionenable instance restbase2012-c.codfw.wmnet
operations/puppet : productionenable instance restbase2012-b.codfw.wmnet
operations/puppet : productionenable instance restbase2012-a.codfw.wmnet
operations/puppet : productionenable instance restbase2011-c.codfw.wmnet
operations/puppet : productionenable instance restbase2011-b.codfw.wmnet
operations/puppet : productionenable instance restbase2011-a.codfw.wmnet
operations/puppet : productionenable instance restbase2010-c.codfw.wmnet
operations/puppet : productionbootstrap restbase2010-b.codfw.wmnet

Event Timeline

Eevans created this task.Nov 18 2016, 8:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 18 2016, 8:40 PM
Eevans triaged this task as Medium priority.Nov 18 2016, 8:41 PM
Eevans edited projects, added Services (doing); removed Services.

Change 322310 had a related patch set uploaded (by Eevans):
bootstrap restbase2010-b.codfw.wmnet

https://gerrit.wikimedia.org/r/322310

Change 322310 merged by Filippo Giunchedi:
bootstrap restbase2010-b.codfw.wmnet

https://gerrit.wikimedia.org/r/322310

Change 322698 had a related patch set uploaded (by Eevans):
enable instance restbase2010-c.codfw.wmnet

https://gerrit.wikimedia.org/r/322698

Change 322698 merged by Filippo Giunchedi:
enable instance restbase2010-c.codfw.wmnet

https://gerrit.wikimedia.org/r/322698

Eevans moved this task from Backlog to In-Progress on the Cassandra board.Nov 21 2016, 7:48 PM
Eevans updated the task description. (Show Details)Nov 21 2016, 7:55 PM
Eevans updated the task description. (Show Details)Nov 21 2016, 9:12 PM

Change 322807 had a related patch set uploaded (by Eevans):
enable instance restbase2011-a.codfw.wmnet

https://gerrit.wikimedia.org/r/322807

Change 322807 merged by Dzahn:
enable instance restbase2011-a.codfw.wmnet

https://gerrit.wikimedia.org/r/322807

Eevans updated the task description. (Show Details)Nov 22 2016, 1:35 AM
Eevans updated the task description. (Show Details)Nov 22 2016, 1:38 AM

Mentioned in SAL (#wikimedia-operations) [2016-11-22T01:40:16Z] <urandom> T151086: RESTBase: Starting 'a' instance Cassandra cleanups, rack 'b', codfw

Eevans updated the task description. (Show Details)Nov 22 2016, 2:49 PM

Change 322896 had a related patch set uploaded (by Eevans):
enable instance restbase2011-b.codfw.wmnet

https://gerrit.wikimedia.org/r/322896

Change 322896 merged by Dzahn:
enable instance restbase2011-b.codfw.wmnet

https://gerrit.wikimedia.org/r/322896

Change 322958 had a related patch set uploaded (by Eevans):
enable instance restbase2011-c.codfw.wmnet

https://gerrit.wikimedia.org/r/322958

Change 322958 merged by Dzahn:
enable instance restbase2011-c.codfw.wmnet

https://gerrit.wikimedia.org/r/322958

Eevans updated the task description. (Show Details)Nov 23 2016, 2:34 PM

Change 323159 had a related patch set uploaded (by Eevans):
enable instance restbase2012-a.codfw.wmnet

https://gerrit.wikimedia.org/r/323159

Eevans updated the task description. (Show Details)Nov 23 2016, 2:48 PM
Eevans updated the task description. (Show Details)
Eevans added a project: RESTBase.

Change 323159 merged by Filippo Giunchedi:
enable instance restbase2012-a.codfw.wmnet

https://gerrit.wikimedia.org/r/323159

Mentioned in SAL (#wikimedia-operations) [2016-11-28T19:42:49Z] <urandom> T151086: bootstrap of restbase2012-a.codfw.wmnet starting...

Eevans updated the task description. (Show Details)Nov 28 2016, 7:50 PM

restbase2012-a.codfw.wmnet is now bootstrapping, but it didn't go easy. Numerous attempts resulted in it either timing out during the gossip shadow round, or refusing to bootstrap for consistency reasons after determining that one or more nodes were down (they were not). We've seen this in the past, but after one or two failures, it would usually start bootstrapping.

In the past I hypothesized that the size of the seed list might be a factor, but I live-hacked the list down to a single node while troubleshooting this, to no avail.

Eventually, I enabled TRACE logging on the single configured seed in an attempt to see what this looked like from the other end of the exchange, only for the bootstrap to finally succeed.

Change 324238 had a related patch set uploaded (by Eevans):
enable instance restbase2012-b.codfw.wmnet

https://gerrit.wikimedia.org/r/324238

Change 324238 merged by Filippo Giunchedi:
enable instance restbase2012-b.codfw.wmnet

https://gerrit.wikimedia.org/r/324238

Eevans updated the task description. (Show Details)Nov 29 2016, 7:35 PM

Change 324469 had a related patch set uploaded (by Eevans):
enable instance restbase2012-c.codfw.wmnet

https://gerrit.wikimedia.org/r/324469

Eevans updated the task description. (Show Details)Nov 30 2016, 3:48 PM

Change 324469 merged by Dzahn:
enable instance restbase2012-c.codfw.wmnet

https://gerrit.wikimedia.org/r/324469

Eevans updated the task description. (Show Details)Nov 30 2016, 11:00 PM
Eevans updated the task description. (Show Details)Dec 1 2016, 3:54 PM
Eevans updated the task description. (Show Details)Dec 1 2016, 3:56 PM
Eevans updated the task description. (Show Details)Dec 2 2016, 4:25 PM
Eevans updated the task description. (Show Details)Dec 2 2016, 4:28 PM
Eevans updated the task description. (Show Details)Dec 5 2016, 7:43 PM
Eevans changed the task status from Open to Stalled.Dec 7 2016, 3:33 PM

Completion of this issue is pending completion of T150964: eqiad: Rack and setup new restbase nodes

Eevans changed the task status from Stalled to Open.Dec 14 2016, 3:43 PM
Eevans updated the task description. (Show Details)

Change 327218 had a related patch set uploaded (by Eevans):
enable instance restbase1016-b.codfw.wmnet

https://gerrit.wikimedia.org/r/327218

Change 327218 merged by Filippo Giunchedi:
enable instance restbase1016-b.codfw.wmnet

https://gerrit.wikimedia.org/r/327218

Change 327260 had a related patch set uploaded (by Eevans):
enable instance restbase1016-c.codfw.wmnet

https://gerrit.wikimedia.org/r/327260

Change 327260 merged by Dzahn:
enable instance restbase1016-c.codfw.wmnet

https://gerrit.wikimedia.org/r/327260

Eevans updated the task description. (Show Details)Dec 15 2016, 3:26 AM

Change 327520 had a related patch set uploaded (by Eevans):
enable instance restbase1017-a.codfw.wmnet

https://gerrit.wikimedia.org/r/327520

Change 327520 merged by Elukey:
enable instance restbase1017-a.codfw.wmnet

https://gerrit.wikimedia.org/r/327520

Eevans updated the task description. (Show Details)Dec 15 2016, 4:32 PM
Eevans updated the task description. (Show Details)

Change 327560 had a related patch set uploaded (by Eevans):
enable instance restbase1017-b.eqiad.wmnet

https://gerrit.wikimedia.org/r/327560

Change 327560 merged by Dzahn:
enable instance restbase1017-b.eqiad.wmnet

https://gerrit.wikimedia.org/r/327560

Eevans updated the task description. (Show Details)Dec 16 2016, 1:08 AM

Change 327745 had a related patch set uploaded (by Eevans):
enable instance restbase1017-c.eqiad.wmnet

https://gerrit.wikimedia.org/r/327745

Change 327745 merged by Elukey:
enable instance restbase1017-c.eqiad.wmnet

https://gerrit.wikimedia.org/r/327745

Eevans updated the task description. (Show Details)Dec 16 2016, 3:57 PM

Change 327847 had a related patch set uploaded (by Eevans):
enable instance restbase1018-a.eqiad.wmnet

https://gerrit.wikimedia.org/r/327847

Change 327847 merged by Dzahn:
enable instance restbase1018-a.eqiad.wmnet

https://gerrit.wikimedia.org/r/327847

Eevans updated the task description. (Show Details)Dec 16 2016, 10:03 PM

Change 328059 had a related patch set uploaded (by Mobrovac):
Conftool: Add restbase101[678] and restbase201[012]

https://gerrit.wikimedia.org/r/328059

Change 328192 had a related patch set uploaded (by Eevans):
enable instance restbase1018-b.eqiad.wmnet

https://gerrit.wikimedia.org/r/328192

Eevans updated the task description. (Show Details)Dec 19 2016, 4:43 PM

Change 328192 merged by Elukey:
enable instance restbase1018-b.eqiad.wmnet

https://gerrit.wikimedia.org/r/328192

Eevans updated the task description. (Show Details)Dec 19 2016, 7:52 PM
Eevans updated the task description. (Show Details)

Change 328213 had a related patch set uploaded (by Eevans):
enable instance restbase1018-c.eqiad.wmnet

https://gerrit.wikimedia.org/r/328213

Change 328213 merged by Filippo Giunchedi:
enable instance restbase1018-c.eqiad.wmnet

https://gerrit.wikimedia.org/r/328213

Change 328059 merged by Giuseppe Lavagetto:
Conftool: Add restbase101[678] and restbase201[012]

https://gerrit.wikimedia.org/r/328059

Eevans updated the task description. (Show Details)Dec 20 2016, 4:31 PM
Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)Dec 23 2016, 5:16 PM
Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)Dec 23 2016, 5:18 PM
Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)Jan 3 2017, 3:04 PM
Eevans updated the task description. (Show Details)Jan 7 2017, 2:59 AM
Eevans closed this task as Resolved.Jan 7 2017, 3:16 AM
Eevans updated the task description. (Show Details)

Complete.