Page MenuHomePhabricator

rack/setup/install ps[12]-60[34]-eqsin
Closed, ResolvedPublic

Description

This task will track the racking and setup of the new ServerTech PDU towers for eqsin, ordered via T223443.

These are due to arrive at Equinix Singapore on January 13, 2020. @wiki_willy has already opened an inbound shipment ticket, details on the order task T223443.

Each rack will require both PDU towers be swapped, but the towers are on different sides of the rack so downtime due to the wrong cord being unplugged is unlikely. Due to the remote-hands nature of this work (Jin will be doing the work for us, he also worked on the CP issues and other remote hands work for us in eqsin) we may want to offline the caching site during the maintenance window.

Start time of 2020-02-06 16:00 Pacific / 2020-02-07 00:00 GMT / 2020-02-07 08:00 Singapore time
Expected window 2-4 hours expected, 4-8 possible if issues during PDU setup.

Steps BEFORE downtime:

  • - Traffic to determine if we are going to offline the site for the window. - yes confirmed via irc chat with @BBlack and @RobH
  • - Traffic and DC-Ops to work out a maintenance window (must coordinate with Jin, ideally during business hours in Singapore.) - will do once the PDUs arrive onsite.
  • - @wiki_willy filed the inbound shipment task and should let us know when the PDUs arrive.
  • - @RobH pre-stage dns for new PDUs - https://gerrit.wikimedia.org/r/562894
  • - @RobH type up directions for Jin to follow to install new PDUs
  • -#dc-ops (@RobH) confirmed PDU brackets in racks will work with the servertech PDUs via email thread with Jin.

Steps for PDU swap:

  • - pdus deployed per directions sent to Jin
  • - librenms updated to monitor pdus
  • - puppet updated to monitor pdus
  • - updated cable mapping imported into netbox
  • - pdu outlets labeled on the PDU directly

Details

Related Gerrit Patches:
operations/puppet : productionadding monitoring for eqsin pdus
operations/dns : mastersetting new eqsin PDUs dns entries

Related Objects

Event Timeline

RobH triaged this task as Medium priority.Jan 8 2020, 5:59 PM
RobH created this task.
Restricted Application added a project: Operations. · View Herald TranscriptJan 8 2020, 5:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).Jan 8 2020, 5:59 PM
RobH updated the task description. (Show Details)Jan 8 2020, 6:05 PM

Change 562894 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting new eqsin PDUs dns entries

https://gerrit.wikimedia.org/r/562894

Change 562894 merged by RobH:
[operations/dns@master] setting new eqsin PDUs dns entries

https://gerrit.wikimedia.org/r/562894

RobH updated the task description. (Show Details)Jan 8 2020, 6:21 PM
RobH removed a project: Patch-For-Review.
RobH moved this task from Backlog to Racking Tasks on the ops-eqsin board.Jan 8 2020, 6:23 PM
ema moved this task from Triage to Hardware on the Traffic board.Jan 13 2020, 1:11 PM
faidon added a subscriber: faidon.Jan 22 2020, 3:44 PM

Hey - this was a Q2 task but it hasn't seen an update in a while. What's the status?

RobH added a comment.Jan 22 2020, 4:09 PM

We only got confirmation of delivery of the PDUs yesterday via email. I'll be dispatching directions to Jin after we determine what date works best.

@BBlack: Do you have a preference on when this work takes place? We want to ensure it is done when someone from Traffic is around. Jin works Singapore hours, and it takes a day or two to setup the work.

Will next week work for a downtime for EQSIN and a swap of the PDUs or will this need to wait until post-all-hands?

RobH added a comment.Jan 23 2020, 6:51 PM

I've not seen @BBlack in IRC since posting the above comment, I suspect due to pre-all-hands-rush. We have SRE meeting time set aside during all hands, so I'll sync up with @BBlack about this.

When chatting with @wiki_willy about this earlier this week, our current plan is likely to offline eqsin from dns/usage sometime in the week of Feb. 3 (week after all hands), or possibly the week of Feb. 10.

RobH moved this task from Backlog to Acknowledged on the Operations board.Jan 23 2020, 6:51 PM
RobH added a comment.Mon, Feb 3, 11:36 PM

Please note this has been confirmed as likely to occur on Feb 6th (GMT). Jin has approved that he can work during that window, and we need to get confirmation from @BBlack that this is ok for Traffic.

RobH added a comment.EditedWed, Feb 5, 5:53 PM

Update:

Start time of 2020-02-06 16:00 Pacific / 2020-02-07 00:00 GMT / 2020-02-07 08:00 Singapore time
Expected window 2-4 hours expected, 4-8 possible if issues during PDU setup.

I've coordinated with Jin via Google Hangout Messages and he has reviewed the rack and ensured he has all the cabled needed. I sent in this email to him, but since then he also followed up immediately and went onsite last evening (my time, early am his time) to work on pre-staging things as best he could.

The email has ticket numbers, google sheet links, and other things I rather not put in this task, so just updating this with generalities and specifics (open tickets) won't get appended until they are resolved/closed.

RobH updated the task description. (Show Details)Wed, Feb 5, 5:53 PM

Mentioned in SAL (#wikimedia-operations) [2020-02-07T01:24:46Z] <robh> eqsin pdu work ongoing starting now. ps1-603 swapping per T242250

@RobH please let me know once the PDUs should be snmp-accessible, they'll need to be added to puppet/monitoring

Change 571522 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] adding monitoring for eqsin pdus

https://gerrit.wikimedia.org/r/571522

Change 571522 merged by RobH:
[operations/puppet@production] adding monitoring for eqsin pdus

https://gerrit.wikimedia.org/r/571522

RobH updated the task description. (Show Details)
RobH removed a project: Traffic.
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)Tue, Feb 11, 5:58 PM
RobH closed this task as Resolved.Tue, Feb 11, 6:01 PM

T244900 created for asset tag application at a later date. This is now resolved and setup for monitoring in librenms and icinga both.

RobH updated the task description. (Show Details)Tue, Feb 11, 6:01 PM