Page MenuHomePhabricator

Replace eqiad mgmt switches with EX4200s
Closed, DeclinedPublic

Description

The current mgmt switches are Netgear switches, the access switches refresh project freed up a bunch of EX4200 that can be used for mgmt instead.
Bringing POE for the cameras (T207965), as well as the possibility to manage them the same way as the prod infra (automation, etc...).

In addition to the manual work required to swap them, a challenge is to provision them (wipe, upgrade, assign IP, configure). At about 1 switch per rack, doing it manually would be very time consuming.

The mgmt network doesn't have DHCP, and ZTP (Juniper auto-provisioning) would probably be a side project of its own.

To start and see what the default configuration and our options looks like, could you please:

  • Connect a ex4200 to console (and update Netbox)
  • Make sure nothing else is connected to it (VC links, etc)
  • Plug a USB drive containing install1002:/srv/junos/jinstall-ex-4200-15.1R7.9-domestic-signed.tgz
  • Power on the device

Event Timeline

ayounsi triaged this task as Normal priority.Jan 8 2019, 12:01 AM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptJan 8 2019, 12:01 AM
Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Jan 16 2019, 2:53 PM

These will not be able to have permanent console connections. I do not have enough available ports on the serial switches.

This is fine, I only need 1 for tests, once in prod they can do without.

Cmjohnson moved this task from Up next to Not urgent on the ops-eqiad board.Mar 25 2019, 5:00 PM
faidon reassigned this task from Cmjohnson to ayounsi.Apr 20 2019, 7:27 PM
faidon added a subscriber: faidon.

I've surfaced the idea myself in the past, but the more I think about it the more I think it's not such a great idea at this point...

  • EX4200s were announced to EOLed a couple of months ago; they can still be supported until 2024, as long as we pay for that, which makes no financial sense for mgmt switches.
  • Most of the switches that we have are ~8 years old, with the oldest being 10+ and the newest ~6.
  • In eqiad, we have 36 EX4200-24/48T, including a) a broken one b) the existing active msw1-eqiad. If we were to use them as mgmt for all 32 racks, that means that we'll only have 2 spares, which isn't enough for old and failing hardware.
  • codfw has 3, msw1-codfw + 2 spares. Even if we were to move those in eqiad, is it really the best use of our time to be moving around 10-year old switches and keep replacing them as they start to fail?
  • Finally, software-wise, the EX4200 are obviously aging and won't get newer releases of JunOS, and they do not support ELS, etc., so all benefits from using JunOS start to deteriorate...

Getting a little off-topic: our msw* are also aging; in eqiad, msw-a* and msw-b* are from 2011-02, msw-c* from 2012-04 and msw-d* from 2013-10. My proposal would be:

  • Buy an EX4300 for each site to replace msw1-eqiad/msw2-codfw
  • Replace all msw* in eqiad (and some of them with PoE :), ASAP.
  • Replace the oldest msw* in codfw (the ones from 2009), ASAP.

This would be a good time for any decisions that involve purchasing new gear, given the FY19-20 timelines. @ayounsi let me know your thoughts and/or file one or more procurement tasks :)

Filed T221675 for the aggregation switches.

I agree, it doesn't make sens to re-purpose such old gear into "production".

I guess we're down to:
1/ Buy new EX2300, more expensive (from the prices I saw online I'd guess they're a least twice more expensive)
Managing them is a good thing, and would bring us better security, visibility, etc... But I don't think we are staffed for this (at least not in Netops, maybe that's something DCops could help with).

2/ Buy manageable Netgear but use them as unmanaged switches (what we're doing now)
Eg. GS752TPv2 which would come to a total of about 16000usd for 32 racks (eqiad).
I don't think it's worth considering managing them. For what I found online the WebUI is the only option and there is no API.

TLDR, if we have the budget and the resources, I'd go with 1, if not, stay with the status quo of 2.

ayounsi closed this task as Declined.May 2 2019, 10:09 PM

Going with option 2. Will open tasks in the next FY when it's time to order them.