For historical context:
When last refreshed, the virtual-chassis got cabled in a similar way to the previous generation ones (mix of daisy-chain and spine/leaf) in order to save ports on the spines.
Unfortunately this happened to be unsupported and caused various instabilities (fixed for example in T256112: eqiad row D switch fabric recabling).
Row C, most likely because it was one rack smaller (C1 is for Fundraising only) never showed those issues. As the recabling is a risky operations it was decided to leave it as it.
Today's T313382: asw2-c5-eqiad crash had an impact on the other racks in row C, due to how the traffic flows from some racks to the routers (LACP grouping + VRRP primary) while it should only have impacted servers in C5.
Another downside is that Juniper support might not want to move forward with any investigation as long as the cabling is not up to standards.
The ideal would have been to not need to do any intrusive action until we need to refresh the hardware, but that won't happen sooner than in 1 or 2 years.
To improve the situation:
- In this task, re-cable the VC so each "leaf" have a link to the "spines"
- T308339: eqiad: move non WMCS servers out of rack C8 will reduce the size of the VC
- T304712: eqiad: Move links to new MPC7E linecard Replace the current LACP setup (eg. ae1 have 2 members to fpc2 and 2 to fpc7, same with ae2) with 1x40G per router, this will make traffic flows more straightforward
See bellow diagram for a "current/final" view.
Next step for this task is to buy the necessary DACs

