Page MenuHomePhabricator

(Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300
Closed, ResolvedPublic0 Estimated Story Points

Description

This task will track the scheduling and swap out of the central mgmt switch in eqiad, msw1-eqiad.

Current msw1/ex4200: https://netbox.wikimedia.org/dcim/devices/50/
Future msw1-EX4300: https://netbox.wikimedia.org/dcim/devices/2269/

The old/existing switch is an EX4200, and will be replaced with the new EX4300 ordered on T221883.

Items to confirm/update:

  • - netops confirms on task if they want new EX4300 racked for configuration in advance of migration
  • - migration window is scheduled
  • - racking and labeling of EX4300 (pending answer if this needs racking prior to migration, or if it can just go where existing EX4200 is presently.)
  • - configuration of EX4300
  • - migration of cables from EX4200 to EX4300
  • - wipe/decommission old EX4200
  • - test ALL mgmt uplinks by connecting to one server in every rack.

Event Timeline

RobH triaged this task as Medium priority.Jun 5 2019, 5:00 PM
RobH created this task.
Restricted Application added a project: SRE. · View Herald TranscriptJun 5 2019, 5:00 PM
RobH added a parent task: Unknown Object (Task).Jun 5 2019, 5:00 PM
RobH renamed this task from upgrade mr1-eqiad from EX4200 to EX4300 to upgrade msw1-eqiad from EX4200 to EX4300.Jun 5 2019, 5:02 PM
RobH changed the task status from Open to Stalled.
RobH updated the task description. (Show Details)
RobH added subscribers: Papaul, Cmjohnson.

Please note @Papaul is working with @ayongsi to upgrade the codfw msw1 on T224250. The current plan is to allow that to complete, and then replicate its work for eqiad.

At that time, @Papaul can work with @Cmjohnson directly to replicate the setup.

Cmjohnson moved this task from Backlog to Racking Tasks on the ops-eqiad board.Jun 27 2019, 4:28 PM
wiki_willy renamed this task from upgrade msw1-eqiad from EX4200 to EX4300 to (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300.Jul 2 2019, 10:37 PM
RobH mentioned this in Unknown Object (Task).Jul 15 2019, 8:01 PM
ayounsi reassigned this task from ayounsi to Papaul.Aug 2 2019, 3:36 PM
ayounsi added a subscriber: ayounsi.

codfw is done. @Papaul let me know if you need help to prepare the eqiad one.

Papaul added a comment.Aug 6 2019, 5:37 AM

@Cmjohnson I put together a "How to" at the link below on how to upgrade the switch. Please let me know if you have any questions.

https://wikitech.wikimedia.org/wiki/Juniper_switch_upgrade

wiki_willy reassigned this task from Papaul to Cmjohnson.Aug 30 2019, 6:19 PM
Cmjohnson updated the task description. (Show Details)Oct 1 2019, 4:59 PM
ayounsi mentioned this in Unknown Object (Task).Oct 16 2019, 7:53 PM
faidon changed the task status from Stalled to Open.Oct 17 2019, 8:56 AM
faidon raised the priority of this task from Medium to High.
faidon added a subscriber: faidon.

What's the status of this? It seems like this migration is in some limbo state :)

As far as I understand it:

  • Old msw1-eqiad, EX4200, is still in production. It has been renamed to "msw1-eqiad-spare" in Netbox, but is not actually a spare.
  • New msw1-eqiad, EX4300, has been received (circa June 2019). It's not currently in Netbox at all, which has resulted into various issues; among others: we told Juniper that this S/N is not ours and we don't need support for it :)
  • The replacement is somewhat underway; the switch has been upgraded but has not been fully cabled(?)

This isn't a particularly urgent task, but it's been a few months now and it seems that we're at the point where this being kept in this state is causing more work to various people than it is to actually go through with the replacement, so perhaps we should prioritize it and complete it soon? Being bold and raising the priority but amenable to lower it again if DC-Ops folks disagree.

RobH updated the task description. (Show Details)Oct 17 2019, 6:49 PM
RobH added a subscriber: Jclark-ctr.

Please note this states it was racked, but it was never added into netbox, so I'm not sure where it is racked.

I've gone ahead and put in the netbox entry, and need either @Cmjohnson or @Jclark-ctr to locate the new msw1-eqiad that is slated to replace the old msw1-eqiad and update the new msw1-eqiad netbox entry to show its asset tag and racked location.

RobH reassigned this task from Cmjohnson to Jclark-ctr.Oct 17 2019, 6:52 PM

John,

Please locate the new msw1-eqiad that I describe below and update the netbox asset tag entry. This will clear up our reporting errors for this device. Then either yourself or @Cmjohnson need to coordinate with @ayounsi on when this can be replaced.

Please note this states it was racked, but it was never added into netbox, so I'm not sure where it is racked.

I've gone ahead and put in the netbox entry, and need either @Cmjohnson or @Jclark-ctr to locate the new msw1-eqiad that is slated to replace the old msw1-eqiad and update the new msw1-eqiad netbox entry to show its asset tag and racked location.

wiki_willy renamed this task from (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 to (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300.Feb 24 2020, 8:26 PM
RobH removed a subscriber: RobH.Mar 3 2020, 6:00 PM

Are there any updates to this task and any particular reasons it's been held up? While this was never super urgent, we're now at the ~one year mark since this was ordered and delivered to the data center. Plus I think because at the time the upgrade was imminent, we only bought support for the new switch and not the old, so we're operating with unsupported HW right now. It'd be great if this were to be completed soon. Thanks!

Hi @faidon - one of the goals we have this quarter is to resolve all backlogged install tasks from q3 and earlier by end of June. With the limited number of onsite hours and reduced frequency of visits the past couple months, Chris and John have been focused more on other priority items lately. However, @Cmjohnson and I chatted a bit earlier today, and we can get this completed in the next 2-3 weeks.

Thanks,
Willy

This is being worked on, I had to put the OS image back on the usb stick. When I reset the switch to factory default the usb was wiped as welll.

Cmjohnson updated the task description. (Show Details)Jun 30 2020, 6:12 PM

new-msw1-eqiad has the correct JUNOS 18.1.3 and the configuration has been copied. Currently connected to port 2 on the a8-scs and can be moved to the correct port once we 100% migrate to the new switch.

I cleaned up some default config leftovers as well as refactored the interfaces the same way we have them in codfw (with storm control).

It's ready to be replaced. Scheduled with Chris over IRC for Monday 6th at 1pm UTC.

The switch has been replaced successfully, next steps:

  • Update Netbox
  • Wipe/decom old switch
Cmjohnson closed this task as Resolved.Jul 8 2020, 4:42 PM
Cmjohnson updated the task description. (Show Details)

netbox updated, old switch removed. updated cable id's. Resolving

ayounsi reopened this task as Open.Jul 8 2020, 5:02 PM

I think there is something wrong:
https://netbox.wikimedia.org/dcim/devices/50/ is the old one but still have all the cables

https://netbox.wikimedia.org/dcim/devices/2269/ is the new one with no cables

I moved all the cables from the old switch to the new one (using csv exports and mass copy/paste).
Please double check them, and fill the few missing ones:
console0
ge-0/0/2
ge-0/0/4
ge-0/0/5 (marked as planned)
me0
And optionally the power cables

Ping? Besides the issues identified by @ayounsi just above, I see that in another comment above @ayounsi mentioned "wipe the switch" but then I saw the switch was removed. @Cmjohnson, can you confirm the switch was wiped before (or after) its removal? (Any reason we didn't go the decom task route here like we normally do?)

Cmjohnson closed this task as Resolved.Sep 2 2020, 5:19 PM

a decom task has been created to track the old-msw1-eqiad. all ports have been updated Resolving this task

ayounsi reopened this task as Open.Sep 3 2020, 7:59 AM

console0 and me0 (mgmt) still show as not connected.

Cmjohnson closed this task as Resolved.Sep 28 2020, 4:33 PM

Both have been updated