Page MenuHomePhabricator

(Need By: TBD) rack/setup/install backup1002 + array
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

Hostname:
backup1002
array backup1002-array1

Racking Proposal:
Rack it on 10G-available racks, and try to avoid, if possible, proximity to backup1001

Networking/Subnet/VLAN/IP:
10G, single connection

Partitioning/Raid:
The 2 SSDs will be in RAID1 software. The array disks will be in RAID6 hw (raid controller). DBA will take care of HW raid partitioning (this is a special host), as long as management access is available.

Partman: backup-format.cfg (got renamed from raid1-lvm-ext4-srv-plus-hwraid.cfg)

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

backup1002:

  • - receive in system on procurement task <enter task # here>
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH added a subscriber: Jclark-ctr.

We did not have the racking info for this before it arrived, I've made the above task. Can you confirm racking and hostname details and then assign to @Jclark-ctr for implementation, thanks!

Can you confirm racking and hostname details

Cannot they be copied from the ones I gave for backup2002? T248934

Can you confirm racking and hostname details

Cannot they be copied from the ones I gave for backup2002? T248934

Sure.

jcrespo updated the task description. (Show Details)

backup1002 rack C7 U8 asset. WMF4805 Port 27
backup1002-array rack C7 U6 asset WMF4806

Jclark-ctr updated the task description. (Show Details)
Jclark-ctr added a subscriber: jcrespo.

Small correction:

backup1002-array1 Please note the 1 at the end, while it is unlikely that we will add a second one, it is not completely impossible 0:-D More than anything for consistency with its brother on codfw :-D.

Change 591389 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for backup1002

https://gerrit.wikimedia.org/r/591389

Change 591389 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns for backup1002

https://gerrit.wikimedia.org/r/591389

Change 592903 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Disallow reimage of db2102, reimage new backup1002

https://gerrit.wikimedia.org/r/592903

Change 592903 merged by Jcrespo:
[operations/puppet@production] mariadb: Disallow reimage of db2102, reimage new backup1002

https://gerrit.wikimedia.org/r/592903

Change 593596 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns (ipv4 & ipv6) backup1002

https://gerrit.wikimedia.org/r/593596

Change 593606 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding backup1002 mac to dhcpd file

https://gerrit.wikimedia.org/r/593606

Change 593596 merged by Cmjohnson:
[operations/dns@master] Adding production dns (ipv4 & ipv6) backup1002

https://gerrit.wikimedia.org/r/593596

Change 593606 merged by Cmjohnson:
[operations/puppet@production] Adding backup1002 mac to dhcpd file

https://gerrit.wikimedia.org/r/593606

Cmjohnson subscribed.

@jcrespo This server is just about ready for install, Can you do the raid cfg and update netboot.cfg. After that, it's ready for install. Feel free to do it yourself or pass it back to me.

Thanks, will take it from here, I should be able to handle this on my own unless unexpected issues arise.

Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts:

['backup1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202005061556_jynus_250721.log.

Change 594746 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] backup1002: Update NIC address for card with link

https://gerrit.wikimedia.org/r/594746

Change 594746 merged by Jcrespo:
[operations/puppet@production] backup1002: Update NIC address for card with link

https://gerrit.wikimedia.org/r/594746

@Cmjohnson I can POST the server and get to BIOS without issue. However, I did the above change^ because the server didn't boot into PXE, and saw the second interface with link from the device manager (not the first one, the :90). I am still unable to boot it into PXE. I get a blank screen after:

PowerEdge R440
BIOS Version: 2.5.4
Console Redirection Enabled Requested by iDRAC

Attempting PXE Boot
iDRAC IPV4:  10.65.3.136
                                
Initializing Firmware Interfaces...
 





Enumerating Boot options... Done
iDRAC IPV4:  10.65.3.136                                                                                                




Lifecycle Controller: Done                                                                                              
Booting...

DNS seems to be working:
dig +short backup1002.eqiad.wmnet
10.64.32.107

Although IPv6 seems to be taking preference:

$ ping backup1002.eqiad.wmnet
(PING backup1002.eqiad.wmnet(backup1002.eqiad.wmnet (2620:0:861:103:10:64:32:107)) 56 data bytes)

vs

$ ping backup2002.codfw.wmnet
PING backup2002.codfw.wmnet (10.192.0.190) 56(84) bytes of data

Papaul had similar issues to make the twin machine to this one (backup2002) to boot into PXE on codfw, maybe he has a tip?

Pls hlp

@jynus I moved the DAC cable to the correct network port now. You should be good to go

Change 595464 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "backup1002: Update NIC address for card with link"

https://gerrit.wikimedia.org/r/595464

Change 595464 merged by Jcrespo:
[operations/puppet@production] Revert "backup1002: Update NIC address for card with link"

https://gerrit.wikimedia.org/r/595464

Thanks, that made it boot. Thank you!

Now I am only blocked by pending update of buster installer to latest point version.

Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts:

['backup1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202005110736_jynus_243793.log.

Completed auto-reimage of hosts:

['backup1002.eqiad.wmnet']

Of which those FAILED:

['backup1002.eqiad.wmnet']

Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts:

['backup1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202005111144_jynus_9957.log.

Completed auto-reimage of hosts:

['backup1002.eqiad.wmnet']

Of which those FAILED:

['backup1002.eqiad.wmnet']

Change 595509 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] backups: Add backup1002 as a spare system, enough to prepare RAID

https://gerrit.wikimedia.org/r/595509

Change 595509 merged by Jcrespo:
[operations/puppet@production] backups: Add backup1002 as a spare system, enough to prepare RAID

https://gerrit.wikimedia.org/r/595509

Change 595517 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] insetup: Disable notifications to "in setup" hosts

https://gerrit.wikimedia.org/r/595517

Change 595519 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] backups: Set backup1002 as an "in setup" system and disable notif.

https://gerrit.wikimedia.org/r/595519

Change 595517 merged by Jcrespo:
[operations/puppet@production] insetup: Disable notifications for "in setup" hosts

https://gerrit.wikimedia.org/r/595517

Change 595519 merged by Jcrespo:
[operations/puppet@production] backups: Set backup1002 as an "in setup" system and disable notif.

https://gerrit.wikimedia.org/r/595519

Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: Marostegui.

the ops-eqiad portion of this task has been completed. Thank you for finishing the install @jcrespo/@Marostegui