Page MenuHomePhabricator

rack/setup/install dbproxy101[2-7].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of seven new dbproxy hosts. These will REPLACE the use of dbproxy1001-1009, noted on the procurement task T191595.

Racking proposal: These new systems should NOT share a rack with one another (if possible) and should also not share a rack with the existing dbproxy1010 and dbproxy1011 (both in c5, so dont place any of these in c5 which looks pretty full anyhow.) Otherwise place in any available 1G racks. Try to spread them evenly between rows A, B, and D.

dbproxy1012:

  • - receive in system on procurement task T191595
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T202367

dbproxy1013:

  • - receive in system on procurement task T191595
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T202367

dbproxy1014:

  • - receive in system on procurement task T191595
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T202367

dbproxy1015:

  • - receive in system on procurement task T191595
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing - needs ip fix
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T202367

dbproxy1016:

  • - receive in system on procurement task T191595
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing - needs testing after dbproxy1015 mgmt ip fix
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T202367

dbproxy1017:

  • - receive in system on procurement task T191595
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T202367

Related Objects

Event Timeline

RobH triaged this task as Medium priority.Jun 7 2018, 8:11 PM
RobH created this task.
Marostegui moved this task from Done to Blocked external/Not db team on the DBA board.
Vvjjkkii renamed this task from rack/setup/install dbproxy101[2-7].eqiad.wmnet to pfbaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii removed Cmjohnson as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Marostegui renamed this task from pfbaaaaaaa to rack/setup/install dbproxy101[2-7].eqiad.wmnet.Jul 2 2018, 5:14 AM
Marostegui assigned this task to Cmjohnson.
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)

Change 449726 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt/production dns dbproxy101[27]

https://gerrit.wikimedia.org/r/449726

Change 449726 merged by Cmjohnson:
[operations/dns@master] Adding mgmt/production dns dbproxy101[27]

https://gerrit.wikimedia.org/r/449726

Cmjohnson updated the task description. (Show Details)
Cmjohnson moved this task from Racking Tasks to Blocked on the ops-eqiad board.

assigning to @RobH to help complete the installation.

Change 452981 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] dbproxy host installations

https://gerrit.wikimedia.org/r/452981

Change 452981 merged by RobH:
[operations/puppet@production] dbproxy host installations

https://gerrit.wikimedia.org/r/452981

So dbproxy1015 drac isn't responsive to network, and dbproxy1017 has a media check failure when attempting to boot PXE.

Odd issue attempting to pxe boot dbproxy1016. It gets no free leases from dhcp, so it cannot then be served the tftp image since its not getting an IP address assignemnt.

The forward/reverse dns for dbproxy1016 all match:

But we have the following:

Aug 15 20:11:11 install1002 dhcpd: DHCPDISCOVER from d0:94:66:5e:c7:ea via 10.64.16.3: network 10.64.16.0/22: no free leases
Aug 15 20:11:11 install1002 dhcpd: DHCPDISCOVER from d0:94:66:5e:c7:ea via 10.64.16.2: network 10.64.16.0/22: no free leases

This is typical when the dns isn't set properly for the private1-d-eqiad (it is set properly in this case) or if the vlan isnt set properly:

robh@asw2-d-eqiad> show interfaces descriptions | grep dbproxy 
ge-1/0/3        up    up   dbproxy1016
ge-3/0/4        up    down dbproxy1017

{master:2}
robh@asw2-d-eqiad> edit 
Entering configuration mode

{master:2}[edit]
robh@asw2-d-eqiad# show interfaces ge-1/0/3 | display inheritance 
description dbproxy1016;
##
## '9192' was expanded from interface-range 'vlan-private1-d-eqiad'
##
mtu 9192;
##
## '0' was expanded from interface-range 'vlan-private1-d-eqiad'
##
unit 0 {
    ##
    ## 'ethernet-switching' was expanded from interface-range 'vlan-private1-d-eqiad'
    ##
    family ethernet-switching {
        ##
        ## 'access' was inherited from group 'access-port'
        ##
        interface-mode access;
        ##
        ## 'vlan' was expanded from interface-range 'vlan-private1-d-eqiad'
        ##
        vlan {
            ##
            ## 'private1-d-eqiad' was expanded from interface-range 'vlan-private1-d-eqiad'
            ##
            members private1-d-eqiad;
        }
    }
}
RobH added a subscriber: ayounsi.

Ok, @ayounsi and I tracked down this issue.

@Cmjohnson: dbproxy1015 and dbproxy1016 have the same IP assigned for mgmt, they both are using dbproxy1016's IP. Please fix dbproxy1015, and test connecting to both dbproxy1015 and dbproxy16, polling service tags, and ensuring they match up.

The production network cable for dbproxy1017 also appears disconnected, giving a media check failure when attempting to PXE boot.

dbproxy1015 had the same ip in the idrac. Fixed

RobH removed RobH as the assignee of this task.Aug 17 2018, 4:55 PM
RobH removed a project: ops-eqiad.

These systems are now ready for the DBA team to take over and press into service. This can be taken over by @jcrespo or @Marostegui. I've not assigned to either since the DBA team triages their DBA tag.

Thank you guys! We'll take it from here

Marostegui updated the task description. (Show Details)