Page MenuHomePhabricator

rack/setup/install rdb10[09|10].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of two new rdb systems ordered on T183426. These should be replacing some out of warranty rdb systems, but @RobH is not exactly certain which ones at this time.

Racking proposal: The older rdb systems (expiring warranties in 2016) are rdb1001-rdb1004. For the purposes of this refresh, the location of these systems is less important than the location of rdb1005-rdb1008. We'll list all of them though, and an attempt should be made to avoid racking these two new systems in any of the racks (if possible.) They also should not share the same rack with one another.

Other rdb systems are in: rdb1001/c4, rdb1002/c7, rdb1003/a4, rdb1004/b4, rdb1005/a3, rdb1006/d3, rdb1007/c4, rdb1008/c5. Avoid placing in the same racks as rdb1005-1008.

rdb1009:

  • - receive in system on procurement task T183426
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

rdb1010:

  • - receive in system on procurement task T183426
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Details

Related Gerrit Patches:
operations/puppet : productionfixing typo in rdb1010 entry
operations/puppet : productionsetup new rdb10(09|10).eqiad.wmnet
operations/dns : masterAdding mgmt/prodcution dns rdb1009/10

Event Timeline

RobH triaged this task as Normal priority.Jun 7 2018, 7:36 PM
RobH created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 7 2018, 7:36 PM
RobH added a comment.Jun 7 2018, 7:41 PM

I've emailed both @Joe and @elukey regarding the racking locations of these, email below:

Giuseppe/Luca,
You are both following this order, so I'm assuming you are the guys to ask about this.
https://phabricator.wikimedia.org/T196685
So in racking these, I've assumed we're decommissioning rdb1001-1004 (all expired warranty in 2016) and keeping rdb1005-1008. If that is so, then we need to avoid racking these 2 new machines in the same racks as rdb1005-1008, but the 1001-1004 are immaterial in terms of racking (since they will decom).
Can one of you confirm/correct/suggest if that is correct? If not, please let us know so we can get these two new systems racked. (via email or via task update!)
Thanks!

RobH updated the task description. (Show Details)Jun 7 2018, 7:41 PM
elukey added a comment.EditedJun 8 2018, 2:12 PM

Hi Rob! So as far as I can see only rdb100[56] have a not-expired warranty, even rdb100[78] are super old. I think that the plan is to keep rdb100[56] and the two new servers, for a total of 4 redis hosts. So in theory we could simply avoid to rack the new hosts in the same racks as 100[56] and that's it (let's also wait for Giuseppe's opinion before pulling the trigger).

EDIT: Had a chat with Giuseppe, it seems that the above is what we'd like to do. He suggested that we should also try to spread those servers among rows if possible.

Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Jun 11 2018, 3:38 PM
Cmjohnson moved this task from Up next to Racking Tasks on the ops-eqiad board.Jun 26 2018, 3:57 PM
Cmjohnson updated the task description. (Show Details)Jun 28 2018, 1:36 PM
Vvjjkkii renamed this task from rack/setup/install rdb10[09|10].eqiad.wmnet to ufbaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
elukey renamed this task from ufbaaaaaaa to rack/setup/install rdb10[09|10].eqiad.wmnet.Jul 2 2018, 6:19 AM
elukey lowered the priority of this task from High to Normal.
elukey updated the task description. (Show Details)
Joe moved this task from Backlog to Blocking others on the User-Joe board.Jul 4 2018, 8:42 AM

Change 449223 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt/prodcution dns rdb1009/10

https://gerrit.wikimedia.org/r/449223

Change 449223 merged by Cmjohnson:
[operations/dns@master] Adding mgmt/prodcution dns rdb1009/10

https://gerrit.wikimedia.org/r/449223

Cmjohnson updated the task description. (Show Details)Jul 30 2018, 3:43 PM
Cmjohnson assigned this task to RobH.Aug 2 2018, 7:43 PM
Cmjohnson updated the task description. (Show Details)
Cmjohnson moved this task from Racking Tasks to Blocked on the ops-eqiad board.

Assigning to @RobH to help with final stage of installation.

faidon added a comment.Aug 3 2018, 9:35 AM

I'm investigating unrelated issues in asw2-b-eqiad and this port is flapping (probably boot-looping into PXE), so I disabled it. @RobH, feel free to un-disable when you're about to install.

Change 451201 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setup new rdb10(09|10).eqiad.wmnet

https://gerrit.wikimedia.org/r/451201

Change 451201 merged by RobH:
[operations/puppet@production] setup new rdb10(09|10).eqiad.wmnet

https://gerrit.wikimedia.org/r/451201

Change 451203 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] fixing typo in rdb1010 entry

https://gerrit.wikimedia.org/r/451203

Change 451203 merged by RobH:
[operations/puppet@production] fixing typo in rdb1010 entry

https://gerrit.wikimedia.org/r/451203

RobH added a comment.Aug 7 2018, 11:16 PM
This comment was removed by RobH.
RobH reassigned this task from RobH to elukey.Aug 8 2018, 12:10 AM
RobH removed projects: Patch-For-Review, ops-eqiad.
RobH updated the task description. (Show Details)

So this should likely get assigned to either @elukey or @Joe, and since Luca commented, to him it goes!

These can now be pressed into service, I left them as role spare.

elukey removed elukey as the assignee of this task.Oct 12 2018, 12:52 PM
jijiki closed this task as Resolved.Nov 9 2018, 8:25 PM
jijiki updated the task description. (Show Details)
jijiki added a subscriber: jijiki.

Both rdb1009 and rdb1010 are in production.