This task will track the racking, setup, and OS installation of (2) new 10G switches, purchased via T271338.
Hostname / Racking / Installation Details
Rob isn't aware of the installation details for these switches, escalated to Willy.
RobH | |
Mar 12 2021, 8:16 PM |
F34489185: WMCS network-L1(1).png | |
Jun 10 2021, 8:05 AM |
This task will track the racking, setup, and OS installation of (2) new 10G switches, purchased via T271338.
Rob isn't aware of the installation details for these switches, escalated to Willy.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Unknown Object (Task) | |||||
Resolved | Jclark-ctr | T277340 (Need By: TBD) rack/setup/install (2) new 10G switches | |||
Unknown Object (Task) | |||||
Resolved | • Cmjohnson | T280977 Rack/power audit in eqiad c8/d5 |
Hi @ayounsi - when we budgeted these last year, I think it was for general expansion of 10g switches. Do you have any specific racks you want to put this in? (like WMCS or something) If not, maybe we can add them to the new racks we'll be purchasing in FY21-22? Thanks, Willy
The original plan was to retrofit two existing 1G racks to 10G as a quick fix for the existing contention. The downside is that the migration from 1G to 10G means a rack downtime the time we replace the ToR switch.
Regardless of the expansion, we should increase the 10G capacity of the current rows (so services can still have row diversity).
The next step is to figure out which two racks are the best match, probably depending on usage. Is that something you could advice on?
Hi @ayounsi - let me check with Chris and John to confirm, but I'm thinking we should target the racks with the most amount of rack space. (so we can phase out any leftover 1g servers) I'll get you more specific racks by the end of week. Thanks, Willy
Hi @ayounsi - just to follow up on this, we should probably wait a bit longer on determining which racks to convert to 10g (after John and Chris can wrap up all the current hw installs in ~1mo). Ideally, we'd want to install these switches in racks, where there aren't too many servers. And we'll have a better idea which racks those will be after the Q3 installs are out of the way. Timing wise, it might actually line up with when the switches arrive. Thanks, Willy
After chatting with @wiki_willy the best use for those switches is to go in the C8 and D5, as cloudsw2 switches to add capacity to the existing cloudsw switches.
Exact cabling TBD, but most likely redundant 40G to their same rack cloudsw1.
See the two cloudsw2 on the right of the diagram.
If you have any spare 40G DACs feel free to use them, length at DCops discretion.
Otherwise let me know if I should open a procurement task for 5 of those DACs (4 + 1 spare)
@Cmjohnson @wiki_willy would it be possible to prioritize this (or at least 1 of the 2) for before the next 2 weeks?
We would like to test a fix for T284592 before rolling it to eqiad rows during the DC switchover, starting on June 28th (for a few weeks)
@ayounsi I should be able to rack them on Friday but not cable them. We are out the week of 5 July and I am on vacation as of this moment the following week. Maybe @Jclark-ctr can do the cabling if you need before 13 July
Thanks @Cmjohnson be great if we can get the ball rolling.
It'd help a lot to get them online early the week starting July 13th, if @Jclark-ctr can help there it would be appreciated :)
We want to test the impact of changing the buffer partition on these when they first go in, before doing any prod. switches. The haste is because we need to make that change in eqiad prod. rows before the DC switchover is rolled back.
It looks like Chris is going to be out for a while. @Jclark-ctr - can you prioritize this one, when you're back next week? Rob has ordered the DAC cables, so they should be arriving soon. Thanks, Willy
Basic plan for bringing this online should be:
switches racked and partialy cabled. Waiting on 40g dac cables Vendor did not ship cables they where waiting on confirmation on tax status
https://phabricator.wikimedia.org/T286575 updated
Thanks @Jclark-ctr
I've set up port 39 on scs-c1-eqiad now, standard port config same as the other Juniper gear. But I get nothing back on the console when I try to connect:
# pmshell 1: ps1-c1-eqiad 2: ps1-c2-eqiad 3: ps1-c3-eqiad 4: ps1-c4-eqiad 5: ps1-c5-eqiad 6: ps1-c6-eqiad 7: ps1-c7-eqiad 8: ps1-c8-eqiad 9: asw-c1-eqiad 10: asw2-c2-eqiad 11: asw2-c3-eqiad 12: asw2-c4-eqiad 13: asw2-c5-eqiad 14: asw2-c6-eqiad 15: asw2-c7-eqiad 16: asw2-c8-eqiad 17: ps1-d1-eqiad 18: ps1-d2-eqiad 19: ps1-d3-eqiad 20: ps1-d4-eqiad 21: ps1-d5-eqiad 22: ps1-d6-eqiad 23: ps1-d7-eqiad 24: ps1-d8-eqiad 25: pfw3a-frack 26: pfw3b-frack 27: fasw-c1a 28: fasw-1b 29: asw2-d1-eqiad 30: asw2-d2-eqiad 31: asw2-d3-eqiad 32: asw2-d4-eqiad 33: asw2-d5-eqiad 34: asw2-d6-eqiad 35: asw2-d7-eqiad 36: asw2-d8-eqiad 39: cloudsw2-c8-eqiad 41: atlas-eqiad 48: cr2-eqsin Connect to port > 39 <-- nothing -->
Is the switch powered up? If not might need to double-check the wiring from it to the OpenGear. Thanks.
Mentioned in SAL (#wikimedia-operations) [2021-08-05T11:47:17Z] <XioNoX> prepare cloudsw1-c8-eqiad for cloudsw2-c8 - T277340
Thanks, I got the initial configuration done.
Left to do for C8:
Change 710506 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] Allow mgmt to reach apt.wo
Change 710506 merged by jenkins-bot:
[operations/homer/public@master] Allow mgmt to reach apt.wo
Change 710534 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] Add cloudsw2-c8-eqiad to Homer
Change 710534 merged by jenkins-bot:
[operations/homer/public@master] Add cloudsw2-c8-eqiad to Homer
Change 710575 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Add cloudsw2-c8-eqiad to monitoring
Change 710575 merged by Ayounsi:
[operations/puppet@production] Add cloudsw2-c8-eqiad to monitoring
Last thing to do is enable the interfaces on the cloudsw1-c8 side and it will be ready to receive servers.
Mentioned in SAL (#wikimedia-operations) [2021-08-09T05:56:16Z] <XioNoX> enable cloudsw1-c8 interfaces toward cloudsw2-c8 - T277340
cloudsw2-c8 is ready to receive servers.
@Jclark-ctr please let us know when cloudsw2-d5 is ready for Netops, and @cmooney will take care of configuring it.
Change 712365 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/homer/public@master] Add cloudsw2-d5-eqiad to Homer
Change 712365 merged by jenkins-bot:
[operations/homer/public@master] Add cloudsw2-d5-eqiad to Homer
Change 712930 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/puppet@production] Add cloudsw2-d5-eqiad to monitoring
Change 712930 merged by Cathal Mooney:
[operations/puppet@production] Add cloudsw2-d5-eqiad to monitoring
Mentioned in SAL (#wikimedia-operations) [2021-08-13T11:36:10Z] <topranks> cloudsw1-d5-eqiad - configuring new 2x40G trunk to cloudsw2-d5-eqiad with homer (T277340)
cloudsw2-d5-eqiad is now configured and ready for server connections.
I believe this task can now be closed?
There's one alert firing with regards to that switch, is that expected? https://alerts.wikimedia.org/?q=instance%3Dcloudsw2-d5-eqiad.mgmt.eqiad.wmnet
At the time of writing:
alertname: Storage over 90% scope: global 1 description summary: Storage over 90% title: Alert for device cloudsw2-d5-eqiad.mgmt.eqiad.wmnet - Storage over 90% 3 hours agoinstance: cloudsw2-d5-eqiad.mgmt.eqiad.wmnet source: librenms team: noc @cluster: wikimedia.org
@dcaro thanks. It's nothing to worry about, the other one (cloudsw2-c8-eqiad) is showing the same. I'll touch base with @ayounsi next week and see what the best way to deal with this is, I had a quick look in LibreNMS but the alert thresholds seem to be globally defined so don't want to make any change just yet.