Page MenuHomePhabricator

cmooney (Cathal Mooney)
SRE (netops)

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
May 10 2021, 3:25 PM (240 w, 1 d)
Availability
Available
IRC Nick
topranks
LDAP User
Cathal Mooney
MediaWiki User
CMooney (WMF) [ Global Accounts ]

Recent Activity

Yesterday

cmooney edited P86679 (An Untitled Masterwork).
Tue, Dec 16, 8:08 PM
cmooney created P86679 (An Untitled Masterwork).
Tue, Dec 16, 8:05 PM
cmooney added a comment to T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface.

I see these lines in /var/log/syslog in the busybox shell:

Dec 16 17:31:55 netcfg[1167]: INFO: Activating interface eno1np0
Dec 16 17:31:55 debconf: --> INPUT low netcfg/link_wait_timeout
Dec 16 17:31:55 debconf: --> GET netcfg/link_wait_timeout
Dec 16 17:31:55 netcfg[1167]: INFO: Waiting time set to 3
Dec 16 17:31:55 debconf: --> SUBST netcfg/link_detect_progress interface eno1np0
Dec 16 17:31:55 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress
Dec 16 17:31:56 netcfg[1167]: INFO: ethtool-lite: eno1np0: carrier down
Dec 16 17:31:57 netcfg[1167]: INFO: ethtool-lite: eno1np0: carrier down
Dec 16 17:31:58 netcfg[1167]: INFO: ethtool-lite: eno1np0: carrier down
Dec 16 17:31:58 netcfg[1167]: INFO: Reached timeout for link detection on eno1npp0
Dec 16 17:31:58 netcfg[1167]: INFO: found no link on interface eno1np0.
Dec 16 17:31:58 netcfg[1167]: INFO: eno1np0 is not a wireless interface. Continuuing.
Dec 16 17:31:58 netcfg[1167]: INFO: Taking down interface eno1np0
Dec 16 17:31:58 netcfg[1167]: INFO: Activating interface eno2np1
Dec 16 17:31:58 debconf: --> INPUT low netcfg/link_wait_timeout
Dec 16 17:31:58 debconf: --> GET netcfg/link_wait_timeout
Dec 16 17:31:58 netcfg[1167]: INFO: Waiting time set to 3
Dec 16 17:31:58 debconf: --> SUBST netcfg/link_detect_progress interface eno2np1
Dec 16 17:31:58 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress
Dec 16 17:31:59 netcfg[1167]: INFO: ethtool-lite: eno2np1: carrier down
Dec 16 17:32:00 netcfg[1167]: INFO: ethtool-lite: eno2np1: carrier down
Dec 16 17:32:01 netcfg[1167]: INFO: ethtool-lite: eno2np1: carrier down
Dec 16 17:32:01 netcfg[1167]: INFO: Reached timeout for link detection on eno2npp1
Dec 16 17:32:01 netcfg[1167]: INFO: found no link on interface eno2np1.
Dec 16 17:32:01 netcfg[1167]: INFO: eno2np1 is not a wireless interface. Continuuing.
Dec 16 17:32:01 netcfg[1167]: INFO: Taking down interface eno2np1
Dec 16 17:32:02 netcfg[1167]: INFO: Taking down interface eno2np1
Dec 16 17:32:02 netcfg[1167]: INFO: Activating interface eno3
Dec 16 17:32:02 debconf: --> INPUT low netcfg/link_wait_timeout
Dec 16 17:32:02 debconf: --> GET netcfg/link_wait_timeout
Dec 16 17:32:02 netcfg[1167]: INFO: Waiting time set to 3
Dec 16 17:32:02 debconf: --> SUBST netcfg/link_detect_progress interface eno3
Dec 16 17:32:02 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress
Dec 16 17:32:02 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down
Dec 16 17:32:03 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down
Dec 16 17:32:04 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down
Dec 16 17:32:05 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down
Dec 16 17:32:05 netcfg[1167]: INFO: Reached timeout for link detection on eno3
Dec 16 17:32:05 netcfg[1167]: INFO: found no link on interface eno3.
Dec 16 17:32:05 netcfg[1167]: INFO: eno3 is not a wireless interface. Continuingg.
Dec 16 17:32:05 netcfg[1167]: INFO: Taking down interface eno3
Dec 16 17:32:05 netcfg[1167]: INFO: Activating interface eno4
Dec 16 17:32:05 debconf: --> INPUT low netcfg/link_wait_timeout
Dec 16 17:32:05 debconf: --> GET netcfg/link_wait_timeout
Dec 16 17:32:05 netcfg[1167]: INFO: Waiting time set to 3
Dec 16 17:32:05 debconf: --> SUBST netcfg/link_detect_progress interface eno4
Dec 16 17:32:05 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress
Dec 16 17:32:05 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down
Dec 16 17:32:06 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down
Dec 16 17:32:07 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down
Dec 16 17:32:08 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down
Dec 16 17:32:08 netcfg[1167]: INFO: Reached timeout for link detection on eno4
Dec 16 17:32:08 netcfg[1167]: INFO: found no link on interface eno4.
Dec 16 17:32:08 netcfg[1167]: INFO: eno4 is not a wireless interface. Continuingg.
Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno4
Dec 16 17:32:08 debconf: --> GET netcfg/choose_interface
Dec 16 17:32:08 netcfg[1167]: INFO: Could not find valid BOOTIF= entry in /proc//cmdline
Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno1np0
Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description
Dec 16 17:32:08 debconf: --> SET netcfg/choose_interface eno1np0: Broadcom Inc.and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller
Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno2np1
Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description
Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno3
Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description
Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno4
Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description
Dec 16 17:32:08 debconf: --> SUBST netcfg/choose_interface ifchoices eno1np0: Brroadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller
Tue, Dec 16, 5:53 PM · Infrastructure-Foundations, SRE
cmooney added a comment to T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface.

Anyway that could also be the culprit, I'll kick off another reimage and see if it does anything different.

Tue, Dec 16, 5:35 PM · Infrastructure-Foundations, SRE
cmooney added a comment to T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface.

@cmooney There is nothing plugged into any of the ports on this server except the expected. idrac and the first 1G port.
I went into the BIOS and the boot port was set to the 10G port. I manually set it to the 1G.

Tue, Dec 16, 3:42 PM · Infrastructure-Foundations, SRE
cmooney updated subscribers of T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface.

It seems the interface can be set through the preseed file we pass to the debian installer. Our current setting is:

d-i netcfg/choose_interface select auto
Tue, Dec 16, 2:44 PM · Infrastructure-Foundations, SRE
cmooney added a subtask for T407472: Install a testing db with Debian Trixie: T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface.
Tue, Dec 16, 1:03 PM · DBA
cmooney added a parent task for T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface: T407472: Install a testing db with Debian Trixie.
Tue, Dec 16, 1:03 PM · Infrastructure-Foundations, SRE
cmooney created T412807: Dell R740xd reimage fails in debian-installer, configures IP on incorrect interface.
Tue, Dec 16, 1:03 PM · Infrastructure-Foundations, SRE
cmooney updated the task description for T411570: Migrate database hosts from BIOS to UEFI.
Tue, Dec 16, 11:56 AM · Epic, DBA
cmooney updated the task description for T411570: Migrate database hosts from BIOS to UEFI.
Tue, Dec 16, 11:55 AM · Epic, DBA
cmooney added a comment to T410717: mr1-codfw: add second uplink to lsw1-a2-codfw.

If a copper run is fine, then it's an SFP-T (that you probably have in stock) on the switch side, and a regular cat5 or 6.

Tue, Dec 16, 10:32 AM · DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE

Mon, Dec 15

cmooney added a comment to T384052: Improve port-utilisation alerting to take QoS into account.

This has come up again in terms of the pages we have been getting of late, and we may take some action to change our QoS profiling across the WAN as a result.

Mon, Dec 15, 11:49 PM · Observability-Alerting, Infrastructure-Foundations, netops, SRE
cmooney added a subtask for T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1: T412156: Inbound errors on interface lsw1-e5-codfw:mgmt0 ().
Mon, Dec 15, 4:24 PM · netops, Infrastructure-Foundations, SRE
cmooney added a parent task for T412156: Inbound errors on interface lsw1-e5-codfw:mgmt0 (): T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1.
Mon, Dec 15, 4:24 PM · SRE, DC-Ops, ops-codfw
cmooney added a subtask for T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1: T412154: Inbound errors on interface lsw1-f2-codfw:mgmt0 ().
Mon, Dec 15, 4:23 PM · netops, Infrastructure-Foundations, SRE
cmooney added a parent task for T412154: Inbound errors on interface lsw1-f2-codfw:mgmt0 (): T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1.
Mon, Dec 15, 4:23 PM · SRE, DC-Ops, ops-codfw
cmooney added a subtask for T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1: T412155: Inbound errors on interface lsw1-e2-codfw:mgmt0 ().
Mon, Dec 15, 4:23 PM · netops, Infrastructure-Foundations, SRE
cmooney added a parent task for T412155: Inbound errors on interface lsw1-e2-codfw:mgmt0 (): T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1.
Mon, Dec 15, 4:23 PM · SRE, DC-Ops, ops-codfw
cmooney added a subtask for T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1: T412152: Inbound errors on interface lsw1-e4-codfw:mgmt0 ().
Mon, Dec 15, 4:23 PM · netops, Infrastructure-Foundations, SRE
cmooney added a parent task for T412152: Inbound errors on interface lsw1-e4-codfw:mgmt0 (): T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1.
Mon, Dec 15, 4:23 PM · SRE, DC-Ops, ops-codfw
cmooney added a subtask for T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1: T412153: Inbound errors on interface lsw1-f4-codfw:mgmt0 ().
Mon, Dec 15, 4:21 PM · netops, Infrastructure-Foundations, SRE
cmooney added a parent task for T412153: Inbound errors on interface lsw1-f4-codfw:mgmt0 (): T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1.
Mon, Dec 15, 4:21 PM · SRE, DC-Ops, ops-codfw
cmooney created T412733: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1.
Mon, Dec 15, 4:21 PM · netops, Infrastructure-Foundations, SRE
cmooney added a comment to T412443: Handle `network_flows_internal` data growth.

I'll keep the conf as is (30 days retention), and will remove the daily loading job, as the data is big enough to not require compaction.

Mon, Dec 15, 1:48 PM · Infrastructure-Foundations, netops, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
cmooney added a comment to T412443: Handle `network_flows_internal` data growth.

Either sampling more, or keeping one week of data for now.

Mon, Dec 15, 9:50 AM · Infrastructure-Foundations, netops, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Sat, Dec 13

cmooney added a comment to T412443: Handle `network_flows_internal` data growth.

That's right, the failue is due to OOM at Druid ingestion job. But it's better for us not ingest while we don't have a decision on how to handle data size.

Sat, Dec 13, 3:53 PM · Infrastructure-Foundations, netops, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Fri, Dec 12

cmooney closed T412513: No byte counters for interfaces on cr2-codfw PIC 0/0 (MPC10E QSFP28 card) as Resolved.

So it seems this is a known problem, we actually hit it before on another card. To avoid entirely we will need to upgrade JunOS on these routers

Fri, Dec 12, 12:58 PM · netops, Infrastructure-Foundations, SRE
cmooney added a comment to T412513: No byte counters for interfaces on cr2-codfw PIC 0/0 (MPC10E QSFP28 card).

Hmm so this problem is worse than I thought at first. It is not just affecting the gnmic stats, but also the SNMP counters (LibreNMS shows the same problem) and even the packet counters shown on the CLI:

cmooney@re0.cr2-codfw> show interfaces xe-0/0/1:3 | match "Output rate" 
Dec 12 12:37:46
  Output rate    : 882741856 bps (82920 pps)
Fri, Dec 12, 12:40 PM · netops, Infrastructure-Foundations, SRE
cmooney created T412513: No byte counters for interfaces on cr2-codfw PIC 0/0 (MPC10E QSFP28 card).
Fri, Dec 12, 12:35 PM · netops, Infrastructure-Foundations, SRE

Thu, Dec 11

cmooney added a comment to T412443: Handle `network_flows_internal` data growth.

@JAllemandou thanks for the task. And apologies this has just hit you out of the blue - we should have reached out to warn you it was likely to increase when we fixed the ACLs to allow this traffic through to the pipeline earlier this week. We probably didn't anticipate it would be so much, but thinking it through that makes sense, it's basically most of our two core sites suddenly started sending data.

Thu, Dec 11, 7:39 PM · Infrastructure-Foundations, netops, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
cmooney added a comment to T409924: High pod latency affecting several dse-k8s-worker nodes in eqiad C/D rows.

If we need to look more into is, I'd suggest running a packet capture (while filtering out as much as possible to not end up with a multiple gig file) during those 20s pod creation time to look at where the extra latency is.

Thu, Dec 11, 1:17 PM · Essential-Work, Infrastructure-Foundations, Data-Platform-SRE (2025.11.07 - 2025.11.28)

Wed, Dec 10

cmooney added a subtask for T396063: Eqiad: row C/D switch refresh: T412271: Decom Juniper EX/QFX switches in eqiad rows C/D.
Wed, Dec 10, 8:42 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, SRE
cmooney added a parent task for T412271: Decom Juniper EX/QFX switches in eqiad rows C/D: T396063: Eqiad: row C/D switch refresh.
Wed, Dec 10, 8:42 PM · ops-eqiad, netops, Infrastructure-Foundations, SRE, DC-Ops
cmooney renamed T412271: Decom Juniper EX/QFX switches in eqiad rows C/D from Decom Nokia EX/QFX switches in eqiad rows C/D to Decom Juniper EX/QFX switches in eqiad rows C/D.
Wed, Dec 10, 8:40 PM · ops-eqiad, netops, Infrastructure-Foundations, SRE, DC-Ops
cmooney created T412271: Decom Juniper EX/QFX switches in eqiad rows C/D.
Wed, Dec 10, 8:40 PM · ops-eqiad, netops, Infrastructure-Foundations, SRE, DC-Ops
cmooney closed T405640: Netbox: Create script to allow multiple host migrations from old -> new switch, a subtask of T404146: Netbox: General updates for Nokia switch support, as Resolved.
Wed, Dec 10, 8:36 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T405640: Netbox: Create script to allow multiple host migrations from old -> new switch as Resolved.
Wed, Dec 10, 8:36 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T404146: Netbox: General updates for Nokia switch support, a subtask of T396063: Eqiad: row C/D switch refresh, as Resolved.
Wed, Dec 10, 8:35 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, SRE
cmooney closed T404146: Netbox: General updates for Nokia switch support as Resolved.
Wed, Dec 10, 8:35 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T402577: Homer: Add Python modules to configure Nokia SR Linux switches as Resolved.

There will be more work to refine the configuration and add elements over time, but closing this for now as we have a working setup and live hosts connected.

Wed, Dec 10, 8:34 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T402588: Eqiad: row C/D switch refresh configuration task, a subtask of T396063: Eqiad: row C/D switch refresh, as Resolved.
Wed, Dec 10, 8:34 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, SRE
cmooney closed T402588: Eqiad: row C/D switch refresh configuration task as Resolved.
Wed, Dec 10, 8:34 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T405558: Nokia: add new switches in eqiad/codfw to monitoring and make 'active', a subtask of T396063: Eqiad: row C/D switch refresh, as Resolved.
Wed, Dec 10, 8:33 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, SRE
cmooney closed T405558: Nokia: add new switches in eqiad/codfw to monitoring and make 'active' as Resolved.

This is done, or at least we have all the major coverage we need.

Wed, Dec 10, 8:33 PM · netops, Infrastructure-Foundations, SRE
cmooney added a comment to T411781: lvs1018: remove cross-rack links to rows A, C and D.

All the ports are now decom'ed on the switches / servers.

Wed, Dec 10, 8:32 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE
cmooney closed T409800: Row C traffic outage Nov 11 2025, a subtask of T404609: eqiad: rows C/D Upgrade Tracking, as Resolved.
Wed, Dec 10, 8:26 PM · SRE, Infrastructure-Foundations, netops, DC-Ops, ops-eqiad
cmooney closed T409800: Row C traffic outage Nov 11 2025 as Resolved.

Folks I am going to close this one for now.

Wed, Dec 10, 8:26 PM · netops, Infrastructure-Foundations, SRE
cmooney created P86503 (An Untitled Masterwork).
Wed, Dec 10, 5:28 PM
cmooney added a comment to T412157: Nokia: how to approach schema differences in SR-Linux versions.

I think the ideal would be to store all the OS versions (Debian, Juniper, Nokia) in Netbox, to for example not have to set the --os parameter in the reimage cookbook, or drive the ZTP process. But we're far from it, so +1 for using a YAML variable.

Wed, Dec 10, 2:36 PM · netops, Infrastructure-Foundations, SRE

Tue, Dec 9

cmooney created T412157: Nokia: how to approach schema differences in SR-Linux versions.
Tue, Dec 9, 8:56 PM · netops, Infrastructure-Foundations, SRE

Mon, Dec 8

cmooney added a comment to T411783: Move cloudweb hosts to cloud racks?.

Thanks for the task @taavi. I think the idea makes sense, we probably need to take a close look at what they expose and how the networking would work if we move them to the cloud racks.

Mon, Dec 8, 3:55 PM · Infrastructure-Foundations, netops, Striker, Horizon, cloud-services-team
cmooney added a comment to T409178: Nokia SR-Linux ARP resolution bug on v24.10.x+.

Nokia have come back to say they were able to reproduce the issue, and confirm the cause as well as the fact it is not a problem in the latest SR-Linux release:

Mon, Dec 8, 1:36 PM · Infrastructure-Foundations, netops, SRE

Fri, Dec 5

cmooney reopened T410606: rancid: message has lines too long for transport as "Open".

Thanks for the work on this @MoritzMuehlenhoff!

Fri, Dec 5, 12:53 PM · netops, Infrastructure-Foundations, SRE

Thu, Dec 4

cmooney closed T405628: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad, a subtask of T405602: Eqiad row C/D switch refresh: LVS changes to support migration, as Resolved.
Thu, Dec 4, 6:18 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney closed T405628: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad as Resolved.
Thu, Dec 4, 6:18 PM · DC-Ops, Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE
cmooney updated the task description for T405609: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad.
Thu, Dec 4, 6:17 PM · Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE, DC-Ops
cmooney updated the task description for T405628: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad.
Thu, Dec 4, 6:16 PM · DC-Ops, Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE
cmooney updated the task description for T405628: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad.
Thu, Dec 4, 6:16 PM · DC-Ops, Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE
cmooney updated the task description for T405628: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad.
Thu, Dec 4, 5:22 PM · DC-Ops, Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE
cmooney added a comment to T408892: ULSFO: New switch configuration.

Additionally for the rebuild we should aim to:

  1. Convert the existing ganeti hosts to routed ganeti
  2. Delete the unused range 198.35.26.240/28 and the sandbox1-ulsfo vlan
  3. Allocate 198.35.26.224/27 for LVS service IPs
Thu, Dec 4, 3:16 PM · Patch-For-Review, SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
cmooney updated the task description for T411781: lvs1018: remove cross-rack links to rows A, C and D.
Thu, Dec 4, 3:08 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE
cmooney added a subtask for T405602: Eqiad row C/D switch refresh: LVS changes to support migration: T411781: lvs1018: remove cross-rack links to rows A, C and D.
Thu, Dec 4, 3:02 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney added a parent task for T411781: lvs1018: remove cross-rack links to rows A, C and D: T405602: Eqiad row C/D switch refresh: LVS changes to support migration.
Thu, Dec 4, 3:02 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE
cmooney renamed T411781: lvs1018: remove cross-rack links to rows A, C and D from lvs1018: remove cross-rack link to asw2-c2-eqiad xe-2/0/13 to lvs1018: remove cross-rack links to rows A, C and D.
Thu, Dec 4, 3:02 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE
cmooney changed the status of T410661: lvs1018: decom links to asw2-c2-eqiad and asw2-d7-eqiad, a subtask of T405602: Eqiad row C/D switch refresh: LVS changes to support migration, from Resolved to Declined.
Thu, Dec 4, 2:09 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney changed the status of T410661: lvs1018: decom links to asw2-c2-eqiad and asw2-d7-eqiad from Resolved to Declined.

Duplicate task made in error, will use T411781

Thu, Dec 4, 2:09 PM · ops-eqiad, Traffic, DC-Ops, SRE
cmooney updated subscribers of T411781: lvs1018: remove cross-rack links to rows A, C and D.
Thu, Dec 4, 2:04 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE
cmooney created T411781: lvs1018: remove cross-rack links to rows A, C and D.
Thu, Dec 4, 1:55 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE
cmooney added a comment to T399180: Cloudcephosd: migrate to single network uplink.

I think the easiest would be to:

  • Remove the spurious enp13s0f1np1 config, run puppet to verify no other changes will be applied
  • Make sure the ifupdown config matches e.g. cloudcephosd1050, modulo addresses
  • Reboot the host and verify addresses/interfaces come up as expected
Thu, Dec 4, 11:58 AM · netops, SRE, Infrastructure-Foundations
cmooney reopened T410989: Remove second network connection for cloudcephosd hosts with single uplink, a subtask of T399180: Cloudcephosd: migrate to single network uplink, as Open.
Thu, Dec 4, 11:57 AM · netops, SRE, Infrastructure-Foundations
cmooney reopened T410989: Remove second network connection for cloudcephosd hosts with single uplink as "Open".

Thanks @VRiley-WMF. I'm gonna re-open this as we still have to deal with cloudcephosd1052.

Thu, Dec 4, 11:57 AM · DC-Ops, SRE

Wed, Dec 3

cmooney updated the task description for T405609: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad.
Wed, Dec 3, 8:14 PM · Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE, DC-Ops
cmooney updated the task description for T405609: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad.
Wed, Dec 3, 8:00 PM · Traffic, ops-eqiad, netops, Infrastructure-Foundations, SRE, DC-Ops
cmooney created P86385 (An Untitled Masterwork).
Wed, Dec 3, 6:47 PM
cmooney removed a project from T410989: Remove second network connection for cloudcephosd hosts with single uplink: ops-codfw.
Wed, Dec 3, 5:17 PM · DC-Ops, SRE
cmooney added a comment to T410989: Remove second network connection for cloudcephosd hosts with single uplink.

the four servers in codfw have had cables physically removed and deleted in netbox.

Wed, Dec 3, 5:16 PM · DC-Ops, SRE
cmooney added a comment to T410989: Remove second network connection for cloudcephosd hosts with single uplink.

DC-Ops folks we can now remove these superflous cables from the racks, and once removed delete the cable in Netbox too.

Wed, Dec 3, 4:21 PM · DC-Ops, SRE
cmooney placed T410989: Remove second network connection for cloudcephosd hosts with single uplink up for grabs.

Ok I've disabled all the unused ports on the cloud switches now. The one exception is for cloudcephosd1052, not sure what is up with this one but it seems that it has the vlan interface added, but still has the physical link configured and is using it? I didn't want to touch it:

cmooney@cloudcephosd1052:~$ ip -4 -br addr show | grep -v DOWN
lo               UNKNOWN        127.0.0.1/8 
ens1f0np0        UP             10.64.148.31/24 
ens1f1np1        UP             192.168.5.14/24 
vlan1121@ens1f0np0 UP             192.168.5.14/24
cmooney@cloudcephosd1052:~$ ip route get fibmatch 192.168.5.1 
192.168.5.0/24 dev ens1f1np1 proto kernel scope link src 192.168.5.14
Wed, Dec 3, 4:20 PM · DC-Ops, SRE
cmooney created P86382 (An Untitled Masterwork).
Wed, Dec 3, 4:00 PM
cmooney created P86381 (An Untitled Masterwork).
Wed, Dec 3, 3:44 PM

Tue, Dec 2

cmooney added a comment to T409579: Upgrade cloud-vps hosts to Debian Trixie.

Just now I ran into this error during reimage:

RuntimeError: Host is in BIOS mode but needs to be UEFI as it is connected to a Nokia switch

Is that right? Do we need to convert preseed to uefi recipes before reimaging for everything plugged into a nokia?

The answer... is yes! T410910

Tue, Dec 2, 1:47 PM · Cloud-VPS, cloud-services-team
cmooney closed T410910: Eqiad row C/D servers need to boot/reimage in UEFI mode as Resolved.

Thanks to the awesome work of @jhathaway this is no longer a requirement. We can use --no82 with a host in BIOS boot mode, and that flag will be set automatically when reimaging on a Nokia switch where it's needed.

Tue, Dec 2, 1:45 PM · netops, Infrastructure-Foundations, SRE
cmooney added a comment to T408892: ULSFO: New switch configuration.

@Papaul as @ayounsi mentions you need to change it in puppet where it is also. Principally to change what IPs the hosts doing BGP are going to peer with at that site.

Tue, Dec 2, 12:51 PM · Patch-For-Review, SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
cmooney created T411480: ssw1-d8-eqiad cross-rack links incorrect in Netbox.
Tue, Dec 2, 12:05 PM · ops-eqiad, netops, Infrastructure-Foundations, DC-Ops, SRE
cmooney added a comment to T405499: Remove lvs1018 L2 link to ssw1-e1-eqiad.

Hey @cmooney It has been reused for that purpose, however it's still being worked on to update the connection in netbox

Tue, Dec 2, 12:02 PM · DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE

Thu, Nov 27

cmooney updated the task description for T411203: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215).
Thu, Nov 27, 4:27 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney created T411203: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215).
Thu, Nov 27, 4:23 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney closed T410751: Reimage cookbook: Warn/set defaults for hosts connected to Nokia switches, a subtask of T410910: Eqiad row C/D servers need to boot/reimage in UEFI mode, as Resolved.
Thu, Nov 27, 12:00 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T410751: Reimage cookbook: Warn/set defaults for hosts connected to Nokia switches as Resolved.

I'm going to close this task as the original ask is now complete. In terms of (what's really a wider question) the changelog being displayed or how to best communicate new options we can work on longer term.

Thu, Nov 27, 12:00 PM · Infrastructure-Foundations
cmooney closed T411098: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad, a subtask of T409286: Nokia L3 bugs [Oct 2025], as Resolved.
Thu, Nov 27, 11:30 AM · Infrastructure-Foundations, SRE
cmooney closed T411098: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad as Resolved.
Thu, Nov 27, 11:30 AM · DC-Ops, ops-eqiad, netops, Infrastructure-Foundations, SRE

Wed, Nov 26

cmooney added a subtask for T409286: Nokia L3 bugs [Oct 2025]: T411098: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad.
Wed, Nov 26, 2:25 PM · Infrastructure-Foundations, SRE
cmooney added a parent task for T411098: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad: T409286: Nokia L3 bugs [Oct 2025].
Wed, Nov 26, 2:25 PM · DC-Ops, ops-eqiad, netops, Infrastructure-Foundations, SRE
cmooney created T411098: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad.
Wed, Nov 26, 2:24 PM · DC-Ops, ops-eqiad, netops, Infrastructure-Foundations, SRE
cmooney added a comment to T411081: Improve how virt networks are configured in cloudgw.

@taavi broadly this looks good to me, nicely done.

Wed, Nov 26, 1:11 PM · tools-infrastructure-team, Cloud-VPS

Tue, Nov 25

cmooney updated the task description for T410910: Eqiad row C/D servers need to boot/reimage in UEFI mode.
Tue, Nov 25, 10:09 PM · netops, Infrastructure-Foundations, SRE
cmooney updated the task description for T409286: Nokia L3 bugs [Oct 2025].
Tue, Nov 25, 10:07 PM · Infrastructure-Foundations, SRE
cmooney updated the task description for T409286: Nokia L3 bugs [Oct 2025].
Tue, Nov 25, 10:06 PM · Infrastructure-Foundations, SRE
cmooney updated the task description for T411054: Nokia SR-Linux DHCP Relay Bug.
Tue, Nov 25, 10:03 PM · netops, Infrastructure-Foundations, SRE