Page MenuHomePhabricator

rack/setup/install db209[45].codfw.wmnet (sanitarium expansion)
Closed, ResolvedPublic

Description

This task will track the receiving, racking, setup, and installation of two new sanitarium cluster hosts for codfw, named db209[45].codfw.wmnet.

Racking Proposal: There are no other sanitarium hosts in codfw. Just place these two new systems in different rows than one another, in 1G racks.

db2094:

  • - receive in system on procurement task T193812
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - Create HW RAID 10
  • - mgmt dns entries added for both asset tag and hostname https://gerrit.wikimedia.org/r/#/c/434830/
  • - network port setup (description, enable, private vlan)
    • end on-site specific steps
  • - production private dns entries added https://gerrit.wikimedia.org/r/#/c/434830/
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch)
  • - puppet accept/initial run
  • - handoff to DBA team for service implementation

db2095:

  • - receive in system on procurement task T193812
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - Create HW RAID 10
  • - mgmt dns entries added for both asset tag and hostname https://gerrit.wikimedia.org/r/#/c/434830/
  • - network port setup (description, enable, private vlan)
    • end on-site specific steps
  • - production private dns entries added https://gerrit.wikimedia.org/r/#/c/434830/
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch)
  • - puppet accept/initial run
  • - handoff to DBA team for service implementation

Event Timeline

RobH triaged this task as Medium priority.May 15 2018, 4:44 PM
RobH created this task.

Change 433537 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow install the new sanitarium hosts

https://gerrit.wikimedia.org/r/433537

Change 433537 merged by Marostegui:
[operations/puppet@production] install_server: Allow install the new sanitarium hosts

https://gerrit.wikimedia.org/r/433537

@Marostegui let me know if this racking proposal works for you

db2094 row A rack A6
db2095 row C rack C6

@Papaul, that will work, only requirement is hosts being on separate rows.

Change 434830 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add production and mgmt DNS entries for db209[45]

https://gerrit.wikimedia.org/r/434830

Change 434830 merged by Marostegui:
[operations/dns@master] DNS: Add production and mgmt DNS entries for db209[45]

https://gerrit.wikimedia.org/r/434830

Change 434863 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add the new sanitarium hosts to the config

https://gerrit.wikimedia.org/r/434863

Change 434863 merged by Marostegui:
[operations/puppet@production] mariadb: Add the new sanitarium hosts to the config

https://gerrit.wikimedia.org/r/434863

Change 434960 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address entries for db209[4-5]

https://gerrit.wikimedia.org/r/434960

Change 434960 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address entries for db209[4-5]

https://gerrit.wikimedia.org/r/434960

I can not pxe boot both servers

Start PXE over IPv4.

Station IP address is 10.192.0.101

Server IP address is 208.80.153.53
NBP filename is lpxelinux.0
NBP filesize is 75607 Bytes

Downloading NBP file...

Succeed to download NBP file.

Boot Failed: PXE Device 1: Embedded NIC 1 Port 1 Partition 1

Start PXE over IPv4.

Station IP address is 10.192.32.6

Server IP address is 208.80.153.53
NBP filename is lpxelinux.0
NBP filesize is 75607 Bytes

Downloading NBP file...

Succeed to download NBP file.

Boot Failed: PXE Device 1: Embedded NIC 1 Port 1 Partition 1

I can the requests arriving fine (this is db2094) but looks like it is not going past that? :

May 24 18:25:51 install2002 dhcpd: DHCPDISCOVER from d0:94:66:5a:04:3b via 10.192.0.2
May 24 18:25:51 install2002 dhcpd: DHCPOFFER on 10.192.0.101 to d0:94:66:5a:04:3b via 10.192.0.2
May 24 18:25:51 install2002 dhcpd: DHCPDISCOVER from d0:94:66:5a:04:3b via 10.192.0.3
May 24 18:25:51 install2002 dhcpd: DHCPOFFER on 10.192.0.101 to d0:94:66:5a:04:3b via 10.192.0.3
May 24 18:25:55 install2002 dhcpd: DHCPREQUEST for 10.192.0.101 (208.80.153.53) from d0:94:66:5a:04:3b via 10.192.0.2
May 24 18:25:55 install2002 dhcpd: DHCPACK on 10.192.0.101 to d0:94:66:5a:04:3b via 10.192.0.2
May 24 18:25:55 install2002 dhcpd: DHCPREQUEST for 10.192.0.101 (208.80.153.53) from d0:94:66:5a:04:3b via 10.192.0.3
May 24 18:25:55 install2002 dhcpd: DHCPACK on 10.192.0.101 to d0:94:66:5a:04:3b via 10.192.0.3
May 24 18:25:55 install2002 atftpd[513]: Serving lpxelinux.0 to 10.192.0.101:1318
May 24 18:25:55 install2002 atftpd[513]: Serving lpxelinux.0 to 10.192.0.101:1319
Papaul subscribed.

@Marostegui it is all yours. The only thing left to do is to add both servers into racktables. I am missing the HW type: PowerEdge R440.
@RobH can you please add PowerEdge R440 into ractables"? thanks

The RAID isn't done apparently:

root@db2095:~# megacli -LDPDInfo -aAll

Adapter #0

Number of Virtual Disks: 0

Exit Code: 0x00

root@db2095:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.8T  1.8G  1.8T   1% /srv

This is the output on the eqiad hosts:

root@db1124:~# megacli -LDPDInfo -aAll

Adapter #0

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 8.729 TB
Sector Size         : 512
Is VD emulated      : Yes
Mirror Data         : 8.729 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 10
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No
Number of Spans: 1
Span: 0 - Number of PDs: 10

root@db1124:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   8.7T  290G  8.5T   4% /srv

Can you get a RAID10 done there as done in eqiad?
Thanks!

db2094 looking good!

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 8.729 TB
Sector Size         : 512
Is VD emulated      : Yes
Mirror Data         : 8.729 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 10
Span Depth          : 1


root@db2094:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   8.7T  9.3G  8.7T   1% /srv

Thanks!

db2095 looks good now!

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 8.729 TB
Sector Size         : 512
Is VD emulated      : Yes
Mirror Data         : 8.729 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 10
Span Depth          : 1


root@db2095:~# df -hT /srv/
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   8.7T  9.3G  8.7T   1% /srv

I am going to close this task and follow up the setup on T190704
Thanks a lot for your help!

Vvjjkkii renamed this task from rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) to qwcaaaaaaa.Jul 1 2018, 1:08 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Marostegui as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Marostegui renamed this task from qwcaaaaaaa to rack/setup/install db209[45].codfw.wmnet (sanitarium expansion).Jul 1 2018, 6:46 PM
Marostegui closed this task as Resolved.
Marostegui claimed this task.
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)