Page MenuHomePhabricator

rack/setup/install dbstore100[3-5].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of the 3 new hosts ordered as dbstore1002 replacements.

This will use the standard database configuration: RAID10 with 256K stripe and writeback. https://wikitech.wikimedia.org/wiki/Raid_setup

Hostname Proposal: dbstore100[3-5].eqiad.wmnet, since this is a dbstore1002 replacement order.

Racking Proposal: rack in any 1G rack other than d1-eqiad (as it has dbstore1001 and dbstore1002)

dbstore1003:

  • - receive in system on procurement task T198174
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

dbstore1004:

  • - receive in system on procurement task T198174
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

dbstore1005:

  • - receive in system on procurement task T198174
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Details

Related Gerrit Patches:

Event Timeline

RobH triaged this task as High priority.Nov 15 2018, 6:28 PM
RobH created this task.
RobH added a subscriber: Cmjohnson.

@elukey: Can you confirm the racking and hostname details?

Hostname Proposal: dbstore100[3-5].eqiad.wmnet, since this is a dbstore1002 replacement order.

Racking Proposal: rack in any 1G rack other than d1-eqiad (as it has dbstore1001 and dbstore1002)

If that is good, please comment and assign to @Cmjohnson for implementation. If it needs changes, comment and assign back to me!

Even know the name dbstore1001 suggests otherwise....this host has nothing to do with the future usage for these dbstore1003-5 hosts, so it can be ignored on the racking plan.
So yes, I think any rack where dbstore1002 isn't present is a good idea.

elukey reassigned this task from elukey to Cmjohnson.Nov 16 2018, 6:31 AM

Even know the name dbstore1001 suggests otherwise....this host has nothing to do with the future usage for these dbstore1003-5 hosts, so it can be ignored on the racking plan.
So yes, I think any rack where dbstore1002 isn't present is a good idea.

+1

elukey moved this task from Backlog to In Progress on the User-Elukey board.Nov 20 2018, 1:42 PM

@RobH @elukey @Marostegui Do you need hardware raid? I do not see it on the ticket

jcrespo added a comment.EditedNov 26 2018, 6:33 PM

Yes, RAID 10 with 256K stripe is our default setup, sorry for not specifying it. Only on very specific setups we will not want that (parsercaches or other special hosts).

jcrespo updated the task description. (Show Details)Nov 26 2018, 6:35 PM

Change 475812 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for dbstore1003-5

https://gerrit.wikimedia.org/r/475812

jcrespo updated the task description. (Show Details)Nov 26 2018, 6:36 PM

Thanks @jcrespo! I didn't want to assume anything.

We are cool! Better to ask than make you work twice :-D

Change 475812 merged by RobH:
[operations/dns@master] Adding mgmt dns for dbstore1003-5

https://gerrit.wikimedia.org/r/475812

Cmjohnson reassigned this task from Cmjohnson to RobH.Nov 26 2018, 7:22 PM
Cmjohnson updated the task description. (Show Details)

@RobH these are ready to for installs.

RobH moved this task from Backlog to High Priority Task on the ops-eqiad board.Nov 26 2018, 9:56 PM

Change 475883 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] settting dbstore100[5-67].eqiad.wmnet production dns entries

https://gerrit.wikimedia.org/r/475883

Change 475883 merged by RobH:
[operations/dns@master] settting dbstore100[5-67].eqiad.wmnet production dns entries

https://gerrit.wikimedia.org/r/475883

Change 475888 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] dbstore100[345].eqiad.wmnet base isntall params

https://gerrit.wikimedia.org/r/475888

Change 475888 merged by RobH:
[operations/puppet@production] dbstore100[345].eqiad.wmnet base isntall params

https://gerrit.wikimedia.org/r/475888

RobH updated the task description. (Show Details)
RobH reassigned this task from RobH to Cmjohnson.Nov 26 2018, 10:32 PM

@Cmjohnson
dbstore1004 shows production network cable issue?

Switch shows it is admin enabled but no link:

ge-1/0/13 up down dbstore1004

Then PXE boot fails:

Broadcom UNDI PXE-2.1 v20.6.50
Copyright (C) 2000-2017 Broadcom Corporation
Copyright (C) 1997-2000 Intel Corporation
All rights reserved.
PXE-E61: Media test failure, check cable
PXE-M0F: Exiting Broadcom PXE ROM.

Can you check the physical cable?

RobH added a comment.Nov 26 2018, 10:33 PM

dbstore1003 & dbstore1005 are fully installed and now online, standing by with role:spare applied.

RobH reassigned this task from Cmjohnson to elukey.Nov 28 2018, 5:51 PM
RobH updated the task description. (Show Details)
RobH removed a project: ops-eqiad.

These are now ready for @elukey to take over. Assigning this task to him.

You can resolve this as you see fit!

elukey closed this task as Resolved.Nov 28 2018, 5:52 PM

Thanks a lot! We are going to follow up in https://phabricator.wikimedia.org/T210478