Page MenuHomePhabricator

rack/setup/install backup1001
Open, NormalPublic

Description

This task will track the racking, setup, installation, and deployment of the new backup1001.eqiad.wmnet. This host is a direct replacement of helium.eqiad.wmnet. It does have more storage capacity (shelves).

Racking Proposal: This needs to be in a 10G networked rack, but can be in ANY 10G rack. It's location in relation to helium is immaterial, since heze will be decommissioned when this is fully online. Just put it in any 10G rack where you have the most power/space/network/access.

Disk Shelf Cabling: These should be wired in a series, taking up only one of the two ports of the external SAS controller. This leaves the other port open for other shelf additions at a later date.

backup1001 + backup1001-array1 + backup1001-array2:

  • - receive in system on procurement task T186816
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Event Timeline

RobH created this task.Jun 5 2018, 4:42 PM
RobH triaged this task as Normal priority.
Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Jun 5 2018, 6:16 PM
Cmjohnson updated the task description. (Show Details)Jun 5 2018, 6:34 PM

Change 437786 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding dns entries for backup1001

https://gerrit.wikimedia.org/r/437786

Change 437788 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] adding mac address backup1001

https://gerrit.wikimedia.org/r/437788

Cmjohnson updated the task description. (Show Details)Jun 6 2018, 4:38 PM

@RobH this is racked and mostly setup. I did not setup the preferred partman recipe...not sure which one it is...also, the disk arrays have not arrived yet so they raid has not been setup either.

Change 437786 merged by Cmjohnson:
[operations/dns@master] Adding dns entries for backup1001

https://gerrit.wikimedia.org/r/437786

Change 437788 merged by Cmjohnson:
[operations/puppet@production] adding mac address backup1001

https://gerrit.wikimedia.org/r/437788

RobH added a comment.Jun 6 2018, 4:42 PM

Ok, I'd hold off on OS install until AFTER we get the shelves, just in case we have any issues we see it then.

Cmjohnson moved this task from Up next to Racking Tasks on the ops-eqiad board.Jun 26 2018, 3:57 PM
Cmjohnson updated the task description. (Show Details)Jun 28 2018, 2:39 PM

disk arrays are racked in D2.

Vvjjkkii renamed this task from rack/setup/install backup1001 to llbaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii removed Cmjohnson as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot assigned this task to Cmjohnson.
CommunityTechBot lowered the priority of this task from High to Normal.
CommunityTechBot renamed this task from llbaaaaaaa to rack/setup/install backup1001.
CommunityTechBot added subscribers: gerritbot, Aklapper.

Change 443995 had a related patch set uploaded (by Volans; owner: Volans):
[operations/dns@master] Fix typo for backup1001 entries

https://gerrit.wikimedia.org/r/443995

Change 443995 merged by Alexandros Kosiaris:
[operations/dns@master] Fix typo for backup1001 entries

https://gerrit.wikimedia.org/r/443995

akosiaris changed the task status from Open to Stalled.Oct 4 2018, 3:26 PM
akosiaris added a subtask: Unknown Object (Task).
RobH closed subtask Unknown Object (Task) as Resolved.Nov 13 2018, 4:20 PM

Change 476032 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Set backup2001's 10G interface in DHCP/PXE

https://gerrit.wikimedia.org/r/476032

Change 476032 merged by Alexandros Kosiaris:
[operations/puppet@production] Set backup2001's 10G interface in DHCP/PXE

https://gerrit.wikimedia.org/r/476032

akosiaris updated the task description. (Show Details)EditedNov 27 2018, 3:52 PM
akosiaris closed this task as Resolved.
This comment has been deleted.
akosiaris reopened this task as Open.Nov 27 2018, 3:54 PM

Solved the wrong task. I mean to resolve T196477

@Cmjohnson, I think we can proceed with this. I did just try to reimage the server but mgmt is not responding

akosiaris@bast1002:~$ ping backup1001.mgmt.eqiad.wmnet
PING backup1001.mgmt.eqiad.wmnet (10.65.2.161) 56(84) bytes of data.
^C
--- backup1001.mgmt.eqiad.wmnet ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6132ms

any ideas ?

@akosiaris Sorry for the really late response to this....the task got buried. No, I don't know why mgmt would not be working now unless it's disconnected or the cable is bad. I will check it next week after all hands.

@akosiaris Sorry for the really late response to this....the task got buried. No, I don't know why mgmt would not be working now unless it's disconnected or the cable is bad. I will check it next week after all hands.

No worries. Let me know when you get to it! Thanks

@Cmjohnson Any news on this ?

backup1001 is all connected now, I do notice that the raid card is not picking up any of the disk arrays.

Change 504873 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] backup1001: Use 10g NIC in DHCP requests

https://gerrit.wikimedia.org/r/504873

Change 504873 merged by Alexandros Kosiaris:
[operations/puppet@production] backup1001: Use 10g NIC in DHCP requests

https://gerrit.wikimedia.org/r/504873

Change 504879 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Update autoinstall params for backup1001

https://gerrit.wikimedia.org/r/504879

Change 504879 merged by Alexandros Kosiaris:
[operations/puppet@production] Update autoinstall params for backup1001

https://gerrit.wikimedia.org/r/504879

Host is up and running but as @Cmjohnson points out in T196478#4976375

akosiaris@backup1001:~$ sudo megacli -PDList -a0
                                     
Adapter #0


Exit Code: 0x00

@Cmjohnson Could it be some cabling issue? Either between the shelf and the host or power to the shelf?

Change 504976 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting backup1001 to spare for now

https://gerrit.wikimedia.org/r/504976

Change 504976 merged by RobH:
[operations/puppet@production] setting backup1001 to spare for now

https://gerrit.wikimedia.org/r/504976