Page MenuHomePhabricator

(Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of es20[26-34].eqiad.wmnet

Hostname / Racking / Installation Details

Hostnames:
es2026.codfw.wmnet
es2027.codfw.wmnet
es2028.codfw.wmnet
es2029.codfw.wmnet
es2030.codfw.wmnet
es2031.codfw.wmnet
es2032.codfw.wmnet
es2033.codfw.wmnet
es2034.codfw.wmnet

Racking Proposal:
2 hosts at B1
2 hosts at C1
2 hosts at D1
1 hosts at A1
2 host at A6
Networking/Subnet/VLAN/IP: 1G. Same VLAN internal as the rest of existing es hosts (es2019 for instance)
Partitioning/Raid: RAID10 strip size 256k (@Marostegui will take of adding them to the correct recipe on puppet)
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

es2026: Row A rack A1

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2027: Row A rack A6 ge-6/0/24

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2028: Row A rack A6 ge-6/0/25

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2029: Row B rack B1 ge-1-/0/7

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2030: Row B rack B1 ge-1/0/8

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2031: Row C rack C1 ge-1/0/23

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2032: Row C rack C1 ge-1/0/25

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2033: Row D rack D1 ge-1/0/17

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

es2034: Row D rack D1 ge-1/0/19

  • - receive in system on procurement task T257786 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a project: SRE. · View Herald TranscriptAug 13 2020, 4:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).Aug 13 2020, 4:40 PM
RobH moved this task from Backlog to Acknowledged on the SRE board.
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH removed a subscriber: RobH.

Change 620881 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Allow install new es hosts in eqiad/codfw

https://gerrit.wikimedia.org/r/620881

Change 620881 merged by Marostegui:
[operations/puppet@production] mariadb: Allow install new es hosts in eqiad/codfw

https://gerrit.wikimedia.org/r/620881

I have merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881 so the hosts will get installed with RAID10, notifications disabled and spare role.
Pending from DC-Ops are the usual DNS and DHCP commits.

Marostegui updated the task description. (Show Details)Aug 18 2020, 9:40 AM

@Marostegui i can only get you 1 server at A1

Papaul updated the task description. (Show Details)Aug 21 2020, 1:28 AM

@Papaul (manuel is on vacations until Monday), what about 1 on A1 and 2 on A6? Same row but it looks like it could fit it?

@Papaul (manuel is on vacations until Monday), what about 1 on A1 and 2 on A6? Same row but it looks like it could fit it?

Thanks Jaime for taking on this. That would work too if @Papaul can make it.

Papaul updated the task description. (Show Details)Aug 24 2020, 4:09 PM
Papaul updated the task description. (Show Details)

The Delivery ETA for this is 08/31/20 so it is not possible to have those servers by 2020-08-31.

Papaul updated the task description. (Show Details)Aug 31 2020, 3:45 PM
Papaul updated the task description. (Show Details)Aug 31 2020, 9:07 PM
Papaul updated the task description. (Show Details)

@Papaul any chances we can place es2034 into A4 or A8 instead of A6?

@Marostegui the only chance placing it in A4 is to use the 10G port since A4 is a 10G switch. In A8 I need to check to see if i have available ports on PD?U.

Thanks, let's see if there are available spaces on A8, if not, let's leave it on A6.

Papaul added a comment.Sep 1 2020, 3:00 PM

The only chance to have it in A8 is if heze is decom before I get the server onsite.

Let's go for A6 then :)
Thanks for checking!

Papaul added a comment.Sep 4 2020, 1:43 AM
papaul@asw-a-codfw# show | compare
[edit interfaces interface-range vlan-private1-a-codfw]
     member ge-6/0/13 { ... }
+    member ge-1/0/7;
[edit interfaces interface-range disabled]
-    member ge-1/0/7;
[edit interfaces]
+   ge-1/0/7 {
+       description es2026;
+   }
papaul@asw-a-codfw# run show interfaces descriptions | match es2026
ge-1/0/7        up    up   es2026
Papaul updated the task description. (Show Details)Sep 4 2020, 1:43 AM

Change 624411 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add prod DNS for es2026

https://gerrit.wikimedia.org/r/624411

Change 624411 merged by Papaul:
[operations/dns@master] DNS: Add prod DNS for es2026

https://gerrit.wikimedia.org/r/624411

Papaul updated the task description. (Show Details)Sep 4 2020, 1:52 AM

Change 624416 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for es2026

https://gerrit.wikimedia.org/r/624416

Change 624416 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for es2026

https://gerrit.wikimedia.org/r/624416

@Papaul do you want me to attempt to get es2026 installed?

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['es2026.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202009071502_marostegui_20879.log.

@Papaul can you check the cable/switch/interface?

PXE-E61: Media test failure, check cable
Papaul added a comment.Sep 7 2020, 3:25 PM

@Marostegui holiday today in the U.S so not at the DC. It is not a cable problem

papaul@asw-a-codfw# run show interfaces descriptions | match es2026
ge-1/0/7        up    up   es2026

try to disable the 10GB NIC in the BIOS

@Papaul sure, no need to get it done today - you shouldn't be checking phab even! :-)
Enjoy your day off! :)

Change 625706 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Change es2026 MAC

https://gerrit.wikimedia.org/r/625706

Change 625706 merged by Marostegui:
[operations/puppet@production] install_server: Change es2026 MAC

https://gerrit.wikimedia.org/r/625706

No luck there @Papaul, things I have noticed:

The fact that PXE doesn't even attempt to start makes me thing this is not mac related, as otherwise it would at least show which MAC is trying to start dhcp from. So it must be somewhere else, interfaces connectivity or similar.
Can you check if the switch has learned any MAC for this particular host?

Per the BIOS
10G MAC: BC:97:E1:57:BA:9A
1G MAC: BC:97:E1:57:BA:98

Completed auto-reimage of hosts:

['es2026.codfw.wmnet']

Of which those FAILED:

['es2026.codfw.wmnet']
Papaul added a comment.Sep 8 2020, 3:17 AM

@Marostegui the Next time you have this problem,

open the first 1GB NIC and change the setting from None to PXE and do the same for the 10G nic change the setting from pxe to none see image below. You can process to the install now i did all the changes

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['es2026.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202009080515_marostegui_32677.log.

@Marostegui the Next time you have this problem,

open the first 1GB NIC and change the setting from None to PXE and do the same for the 10G nic change the setting from pxe to none see image below. You can process to the install now i did all the changes

Thank @Papaul, that was indeed the missing bit! The host is getting installed now. And so far so good :-)
Thank you for finding the issue on a day off - you shouldn't have worked!
Much appreciated it

Completed auto-reimage of hosts:

['es2026.codfw.wmnet']

and were ALL successful.

es2026 got installed correctly:

root@es2026:~# free -g ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            251           0         250           0           0         249
Swap:             7           0           7
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   9.1T   11G  9.1T   1% /srv

root@es2026:~# megacli -LDInfo -Lall -aALL


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 10.913 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 10.913 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 12
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU

I have given it most of the vg remaining size:

root@es2026:~# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda3  tank lvm2 a--  <10.87t 1.77t
root@es2026:~# lvextend -L+1600G /dev/mapper/tank-data
  Size of logical volume tank/data changed from 9.09 TiB (2384188 extents) to <10.66 TiB (2793788 extents).
  Logical volume tank/data successfully resized.
root@es2026:~# xfs_growfs /srv
meta-data=/dev/mapper/tank-data  isize=512    agcount=10, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=2441408512, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 2441408512 to 2860838912
root@es2026:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs    11T   12G   11T   1% /srv
root@es2026:~# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda3  tank lvm2 a--  <10.87t <217.06g
Marostegui updated the task description. (Show Details)Sep 8 2020, 6:05 AM
Papaul updated the task description. (Show Details)Sep 8 2020, 5:57 PM
Papaul updated the task description. (Show Details)Sep 8 2020, 10:34 PM
Papaul updated the task description. (Show Details)Sep 9 2020, 2:47 AM
Papaul updated the task description. (Show Details)Sep 9 2020, 3:28 PM
Papaul updated the task description. (Show Details)Sep 9 2020, 3:35 PM
Papaul updated the task description. (Show Details)Sep 9 2020, 3:45 PM
Papaul updated the task description. (Show Details)Sep 9 2020, 4:49 PM
Papaul updated the task description. (Show Details)Sep 9 2020, 6:11 PM

Change 626387 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add production DNS for es2027-es2034

https://gerrit.wikimedia.org/r/626387

Change 626387 merged by Papaul:
[operations/dns@master] DNS: Add production DNS for es2027-es2034

https://gerrit.wikimedia.org/r/626387

Papaul updated the task description. (Show Details)Sep 10 2020, 2:10 PM

Change 626404 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for es2027-es2034

https://gerrit.wikimedia.org/r/626404

Change 626404 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for es2027-es2034

https://gerrit.wikimedia.org/r/626404

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2027.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009101604_pt1979_4360_es2027_codfw_wmnet.log.

Completed auto-reimage of hosts:

['es2027.codfw.wmnet']

Of which those FAILED:

['es2027.codfw.wmnet']

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2027.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009101624_pt1979_8604_es2027_codfw_wmnet.log.

Completed auto-reimage of hosts:

['es2027.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2028.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009101807_pt1979_25637_es2028_codfw_wmnet.log.

Papaul updated the task description. (Show Details)Sep 10 2020, 6:09 PM
Papaul updated the task description. (Show Details)Sep 10 2020, 6:28 PM

Completed auto-reimage of hosts:

['es2028.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2029.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009102005_pt1979_13731_es2029_codfw_wmnet.log.

Completed auto-reimage of hosts:

['es2029.codfw.wmnet']

and were ALL successful.

Papaul updated the task description. (Show Details)Sep 10 2020, 8:47 PM

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2030.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009102051_pt1979_23945_es2030_codfw_wmnet.log.

Papaul updated the task description. (Show Details)Sep 10 2020, 9:11 PM

Completed auto-reimage of hosts:

['es2030.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2031.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009102133_pt1979_32214_es2031_codfw_wmnet.log.

Completed auto-reimage of hosts:

['es2031.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2032.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009102213_pt1979_8632_es2032_codfw_wmnet.log.

Papaul updated the task description. (Show Details)Sep 10 2020, 10:50 PM

Completed auto-reimage of hosts:

['es2032.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2033.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009102255_pt1979_15332_es2033_codfw_wmnet.log.

Completed auto-reimage of hosts:

['es2033.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

es2034.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009102344_pt1979_25423_es2034_codfw_wmnet.log.

Completed auto-reimage of hosts:

['es2034.codfw.wmnet']

and were ALL successful.

Papaul closed this task as Resolved.Sep 11 2020, 12:25 AM
Papaul updated the task description. (Show Details)

@Marostegui all yours

Thank you @Papaul - they all look good to me.