cmooney (Cathal Mooney)
SRE (netops)

Projects

Calendar

User Details

User Since: May 10 2021, 3:25 PM (158 w, 1 d)
Availability: Available
IRC Nick: topranks
LDAP User: Cathal Mooney
MediaWiki User: CMooney (WMF) [ Global Accounts ]

Recent Activity
View All

Today

cmooney created P62788 (An Untitled Masterwork).

Tue, May 21, 4:34 PM

cmooney created P62774 (An Untitled Masterwork).

Tue, May 21, 12:43 PM

cmooney triaged T365455: msw1-codfw links are connected to wrong ports as Medium priority.

Tue, May 21, 10:27 AM · SRE, ops-codfw, DC-Ops

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

@Jhancock.wm @Papaul I'd been using the server in b7 for testing already, but I should be able to move over to the one in a8 instead (I assume we have the same problem with public1-a-codfw as we had with public1-b-codfw)

Tue, May 21, 8:10 AM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

Yesterday

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

So some interesting findings when testing today.

Mon, May 20, 1:58 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T362421: magru network setup.

In T362421#9808627, @ayounsi wrote:

The Telxius community doesn't seem to be of any effect so far, I'll wait for their reply, maybe they changed or need to be enabled on their side first. I'll look at the other providers afterwards.

Mon, May 20, 9:43 AM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

Fri, May 17

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

From what I can tell the 'authoritative' statement only controls NAK generation. I think we're hitting this part of the code, and the different source address (of another switch) on the duplicate REQUESTS is why it is sending the NAKs:

Fri, May 17, 3:46 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

Re-reading the man page for dhcpd.conf it seems that pontentially changing the 'authoritative' statement at the top of our config to 'not authoritative' would prevent it sending the NAKs. Might be worth a shot? Better to not create them than to filter them elsewhere. I don't believe in our environment there is any use-case where we need NAKs.

Fri, May 17, 3:21 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney created P62586 drop-nak diff.

Fri, May 17, 2:07 PM

cmooney added a comment to T350579: Support Anycast GW on EVPN switches without unique IP.

Just a note on this, I only discovered this document after the task:

Fri, May 17, 1:24 PM · Infrastructure-Foundations, netops, SRE

cmooney added a comment to T362421: magru network setup.

In T362421#9808194, @ayounsi wrote:

They might prefer going through EdgeUno once we add the prepending to Novvacore, so the same change would be needed there as well.

Fri, May 17, 10:59 AM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

Also I didn't see in the dhcpd docs and way to constrain the generation of NAKs in response to invalid REQUEST messages.

Fri, May 17, 10:49 AM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

One observation is that the NAK's are unique in so far as they are sent from 208.80.153.33 (Switch IRB int IP) to 255.255.255.255 (and matching L2 MACs).

Fri, May 17, 10:42 AM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

Pcap of DHCP request from contint2002 here:

Fri, May 17, 10:32 AM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps.

In T359054#9807307, @CDanis wrote:

Adding the 3rd transit link in magru greatly improved the latency for many users in Argentina.

Fri, May 17, 10:08 AM · Infrastructure-Foundations, SRE, Traffic

cmooney added a comment to T362421: magru network setup.

+1 sounds like a good idea. Nice we have some limited scope to experiment with the DoH ranges before pulling the plug on ns2.

Fri, May 17, 9:26 AM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

Thu, May 16

cmooney updated the task description for T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

Thu, May 16, 9:45 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

Thu, May 16, 9:31 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

Thu, May 16, 9:31 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney triaged T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches as High priority.

Thu, May 16, 9:23 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney closed T314847: Separate WMCS control and management plane traffic as Resolved.

This has been implemented and the new vlan setup is recorded here. Closing task

Thu, May 16, 7:54 PM · SRE, Cloud Services Proposals, Infrastructure-Foundations, netops

cmooney removed a project from T297355: Optimise WMF WAN Network Configuration: SRE-Sprint-Week-Sustainability-March2023.

Thu, May 16, 7:05 PM · Sustainability (Incident Followup), netops, Infrastructure-Foundations

cmooney lowered the priority of T297355: Optimise WMF WAN Network Configuration from Medium to Low.

Thanks. It is very much something we wish to do but unfortunately other priorities have always trumped it for multiple past quarters.

Thu, May 16, 7:05 PM · Sustainability (Incident Followup), netops, Infrastructure-Foundations

cmooney created P62532 (An Untitled Masterwork).

Thu, May 16, 5:25 PM

cmooney created P62531 (An Untitled Masterwork).

Thu, May 16, 5:19 PM

cmooney created P62530 (An Untitled Masterwork).

Thu, May 16, 5:12 PM

cmooney updated the task description for T365169: Switch BGP (EVPN) topology between rows/spines at core sites.

Thu, May 16, 4:54 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T362421: magru network setup.

And fwiw announcement looks good, all 3 of our transits are learning it ok, and I see it on other carriers from those sources as well. We also see live requests on the doh servers.

Thu, May 16, 4:51 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney created P62527 (An Untitled Masterwork).

Thu, May 16, 4:25 PM

cmooney raised the priority of T365169: Switch BGP (EVPN) topology between rows/spines at core sites from Low to Medium.

Thu, May 16, 3:15 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T365169: Switch BGP (EVPN) topology between rows/spines at core sites.

Thu, May 16, 3:13 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T365169: Switch BGP (EVPN) topology between rows/spines at core sites.

Thu, May 16, 3:08 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney added a parent task for T365169: Switch BGP (EVPN) topology between rows/spines at core sites: T364095: Codfw row C/D switch installation & configuration.

Thu, May 16, 3:04 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney added a subtask for T364095: Codfw row C/D switch installation & configuration: T365169: Switch BGP (EVPN) topology between rows/spines at core sites.

Thu, May 16, 3:03 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney triaged T365169: Switch BGP (EVPN) topology between rows/spines at core sites as Low priority.

Thu, May 16, 2:59 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

Wed, May 15

cmooney added a comment to T364893: an-worker1165.eqiad.wmnet and increased network activity resulting in page on May 13 2024.

In T364893#9800673, @BTullis wrote:

I think that the most likely candidate at the moment is user-generated load from someone doing a heavy query in Superset, thereby pulling lots of data from HDFS, via the presto workers.

Wed, May 15, 4:29 PM · Data-Platform-SRE (2024.05.06 - 2024.05.26), netops, Infrastructure-Foundations

cmooney closed T364893: an-worker1165.eqiad.wmnet and increased network activity resulting in page on May 13 2024 as Resolved.

Gonna close this one. As a last datapoint if you 'stack' the Hadoop graph in Grafana you can clearly see the cumulative reads at ~15:55 on May 14th was a good deal higher than any of the other spikes of usage over the past few days (peaks at almost 200Gbit/sec). So it makes sense that paged and the others didn't.

Wed, May 15, 2:36 PM · Data-Platform-SRE (2024.05.06 - 2024.05.26), netops, Infrastructure-Foundations

cmooney triaged T364893: an-worker1165.eqiad.wmnet and increased network activity resulting in page on May 13 2024 as Low priority.

In T364893#9799598, @JAllemandou wrote:

The amount of data read varies a lot depending on jobs. One can see that the amount of data read from HDFS is very spiky: https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&var-hadoop_cluster=analytics-hadoop&from=now-7d&to=now&viewPanel=111

Wed, May 15, 2:20 PM · Data-Platform-SRE (2024.05.06 - 2024.05.26), netops, Infrastructure-Foundations

cmooney lowered the priority of T365012: Add cloudsw to gnmic interface stats collection from Medium to Low.

Seems this is not possible as the cloudsw's still on JunOS 18 don't support exporting the data within the mgmt routing-instance.

Wed, May 15, 2:04 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney triaged T365012: Add cloudsw to gnmic interface stats collection as Medium priority.

Wed, May 15, 1:28 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney created P62413 (An Untitled Masterwork).

Wed, May 15, 10:32 AM

Tue, May 14

cmooney added a comment to T364893: an-worker1165.eqiad.wmnet and increased network activity resulting in page on May 13 2024.

Thanks for the task and analysis.

Tue, May 14, 9:57 PM · Data-Platform-SRE (2024.05.06 - 2024.05.26), netops, Infrastructure-Foundations

cmooney closed T364480: Extend BGP peer automation via Netbox to include VMs as Resolved.

Patch to Homer wmf plugin merged now, so BGP to VMs at POPs / on L3 switches now under automation too.

Tue, May 14, 5:24 PM · netops, Infrastructure-Foundations, SRE

cmooney committed rOSNE0b9a83203025: Increase timeout for Netbox Capirca script.

Increase timeout for Netbox Capirca script

Tue, May 14, 4:29 PM

cmooney committed rOSHPc61b82533855: Support VM BGP automation using Netbox flag for L3 POPs.

Support VM BGP automation using Netbox flag for L3 POPs

Tue, May 14, 4:02 PM

cmooney added a comment to T363702: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain.

In T363702#9794749, @Volans wrote:

I think that the proposed check covers only a very specific failure scenario that is unlikely to happen again before Liberica

Tue, May 14, 2:15 PM · Patch-For-Review, Data-Platform-SRE (2024.05.06 - 2024.05.26), Traffic

cmooney added a comment to T364459: Migrate eqiad1 cloudnets to Neutron OVS agent.

Thanks for the task @taavi. Looks well put together let me know the exact time you're starting and if feel free to ping me if there is anything you need checked from the physical network side of things (where MAC addresses are in the forwarding tables etc.)

Tue, May 14, 11:14 AM · cloud-services-team (FY2023/2024-Q3-Q4), Cloud-VPS

cmooney added a comment to T187929: Cloud IPv6 subnets.

In T187929#9793592, @ayounsi wrote:

@cmooney what do you think of duplicating the other POPs allocation scheme?
For example looking at eqiad as example, keep 2a02:ec80:a000::/40 as "reserved for future growth"
Then use 2a02:ec80:a000::/48 for the existing WMCS eqiad infra
Then 2a02:ec80:a000::/56 for public, another /56 for private, /55 for the infra, 2a02:ec80:a000:ed1a::/64 for VIPs, etc

Tue, May 14, 10:59 AM · User-aborrero, Infrastructure-Foundations, SRE, netops

Mon, May 13

cmooney added a comment to T358868: Use BGP to announce VM ranges from cloudnet to cloudgw.

Happy to discuss. I think if we are doing this it makes sense to do the cloudgw <-> cloudsw BGP at the same time (we will need to create the Bird config for the cloudgw to talk to cloudnet, so while we are doing so let's do the other side too).

Mon, May 13, 4:50 PM · cloud-services-team (FY2023/2024-Q3-Q4), User-aborrero, Cloud-VPS

cmooney closed T364559: Create (or teach Andrew how to create) private connections+dns entries for new cloudcontrols as Resolved.

In T364559#9784420, @Andrew wrote:

Reimaging cloudcontrol2006-dev works now, thanks!

Mon, May 13, 2:48 PM · SRE, netops, Infrastructure-Foundations, ops-codfw, cloud-services-team

cmooney edited P62344 (An Untitled Masterwork).

Mon, May 13, 10:46 AM

cmooney edited P62344 (An Untitled Masterwork).

Mon, May 13, 10:45 AM

cmooney edited P62344 (An Untitled Masterwork).

Mon, May 13, 10:45 AM

cmooney created P62344 (An Untitled Masterwork).

Mon, May 13, 10:37 AM

Fri, May 10

cmooney updated subscribers of T360789: codfw row C/D upgrade racking task.

@Papaul I've added all the links for the new switches in Netbox now:

Fri, May 10, 5:06 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops

cmooney created P62274 Add row c/d spine-leaf links with add_clost_nets.py script.

Fri, May 10, 1:01 PM

cmooney added a comment to T360789: codfw row C/D upgrade racking task.

@ayounsi @cmooney please provide the interfaces to use on cr* for the uplink from ssw1-d1 to cr1 and ssw1-d8 to cr2

Fri, May 10, 12:25 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops

fgiunchedi awarded T364454: mgmt ssh access for prometheus hosts in magru a Like token.

Fri, May 10, 9:21 AM · netops, Traffic, Infrastructure-Foundations

Thu, May 9

cmooney added a comment to T364559: Create (or teach Andrew how to create) private connections+dns entries for new cloudcontrols.

Hey Andrew,

Thu, May 9, 4:33 PM · SRE, netops, Infrastructure-Foundations, ops-codfw, cloud-services-team

cmooney added a comment to T354872: Re-IP Swift hosts to per-rack subnets in codfw row A and B..

In T354872#9529469, @MatthewVernon wrote:

Sorry, I think object stores are often not really written with renumbering in mind...

Thu, May 9, 4:22 PM · SRE-swift-storage, Infrastructure-Foundations, SRE

Wed, May 8

cmooney closed T364454: mgmt ssh access for prometheus hosts in magru as Resolved.

Sorry for the delay, the capirca script times out a lot for some reason will need to look at that.

Wed, May 8, 4:06 PM · netops, Traffic, Infrastructure-Foundations

cmooney closed T364454: mgmt ssh access for prometheus hosts in magru, a subtask of T364016: Q4:magru VM tracking task, as Resolved.

Wed, May 8, 4:05 PM · Traffic, Infrastructure-Foundations

cmooney added a comment to T364454: mgmt ssh access for prometheus hosts in magru.

In T364454#9780995, @ssingh wrote:

On mr1-magru, I see 10.140.1.18 (prometheus7001) and denied by policy, which makes me wonder if we need to run https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ for Capirca to generate the ACL?

Wed, May 8, 3:34 PM · netops, Traffic, Infrastructure-Foundations

cmooney triaged T364480: Extend BGP peer automation via Netbox to include VMs as Medium priority.

Wed, May 8, 3:02 PM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T364464: Comms to msw-d2-codfw down.

In T364464#9780633, @Papaul wrote:

@cmooney I think this is just a human error issue. We were racking all the lsw1-d* yesterday and maybe we accidentally bumped into the cable. We will check once on site.

Thanks

Wed, May 8, 1:32 PM · netops, SRE, Infrastructure-Foundations, ops-codfw

cmooney updated the task description for T364464: Comms to msw-d2-codfw down.

Wed, May 8, 11:14 AM · netops, SRE, Infrastructure-Foundations, ops-codfw

cmooney updated subscribers of T364464: Comms to msw-d2-codfw down.

Wed, May 8, 10:58 AM · netops, SRE, Infrastructure-Foundations, ops-codfw

cmooney triaged T364464: Comms to msw-d2-codfw down as High priority.

Wed, May 8, 10:58 AM · netops, SRE, Infrastructure-Foundations, ops-codfw

Fri, May 3

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 2:04 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 1:41 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney assigned T364097: Decom lsw1-a1-codfw to Papaul.

@Papaul I think this one is ready to be moved to rack D1 now.

Fri, May 3, 1:41 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 1:40 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 12:07 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 11:51 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney triaged T364103: Adjust IBGP route-reflector spine/leaf automation to support separate client clusters as Medium priority.

Fri, May 3, 11:13 AM · netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 10:50 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T364097: Decom lsw1-a1-codfw.

Fri, May 3, 10:44 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T364097: Decom lsw1-a1-codfw.

Device has been removed from LiberNMS now. I also downtimed it for 2 weeks just in case I mess up the order of anything.

Fri, May 3, 10:35 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a parent task for T364097: Decom lsw1-a1-codfw: T364095: Codfw row C/D switch installation & configuration.

Fri, May 3, 10:25 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a subtask for T364095: Codfw row C/D switch installation & configuration: T364097: Decom lsw1-a1-codfw.

Fri, May 3, 10:25 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney triaged T364097: Decom lsw1-a1-codfw as Medium priority.

Fri, May 3, 10:25 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added subtasks for T364095: Codfw row C/D switch installation & configuration: Unknown Object (Task), T360789: codfw row C/D upgrade racking task.

Fri, May 3, 10:16 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a parent task for T360789: codfw row C/D upgrade racking task: T364095: Codfw row C/D switch installation & configuration.

Fri, May 3, 10:16 AM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops

cmooney triaged T364095: Codfw row C/D switch installation & configuration as Medium priority.

Fri, May 3, 10:16 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a comment to T345823: Wikikube staging clusters are out of IPv4 Pod IP's.

In T345823#9766574, @JMeybohm wrote:

50 should be absolutely fine. My theory is that while I was moving around Pods and IP blocks in the cluster(s), calico tried to be clever (keep connectivity as far as possible) and started announcing /32 "blocks" for a bunch of pods - which tripped the limit.

Fri, May 3, 10:11 AM · Prod-Kubernetes, Kubernetes, serviceops

cmooney updated the task description for T364092: Upgrade core routers to Junos 22.2R3.

Fri, May 3, 9:50 AM · netops, Infrastructure-Foundations, SRE

cmooney triaged T364092: Upgrade core routers to Junos 22.2R3 as Medium priority.

Fri, May 3, 9:49 AM · netops, Infrastructure-Foundations, SRE

cmooney lowered the priority of T362522: mr1-eqsin performance issue from High to Medium.

Fri, May 3, 9:35 AM · Infrastructure-Foundations, netops

cmooney added a comment to T345823: Wikikube staging clusters are out of IPv4 Pod IP's.

In T345823#9763845, @JMeybohm wrote:

We decided during migration of production to a bigger Pod IP space that this will not be necessary for staging and it actually is not. The issue there (as we figured later) that the IP space is split into /26 blocks, effectively limiting the cluster size to 4 nodes (including control-plane). The change to the IP block size was made to overcome this limitation without the need of changing the Pod IP space (and therefore having to reconfigure that in various places).

Fri, May 3, 9:34 AM · Prod-Kubernetes, Kubernetes, serviceops

Thu, May 2

cmooney added a comment to T345823: Wikikube staging clusters are out of IPv4 Pod IP's.

Not sure if it might be worth taking a step back and weighing up what's happening here?

Thu, May 2, 12:28 PM · Prod-Kubernetes, Kubernetes, serviceops

Wed, May 1

cmooney renamed T363895: BGP status (instance cr2-eqord) - April 2024 - Equinix peering AS15830 from Alert in need of triage: BGP status (instance cr2-eqord) to BGP status (instance cr2-eqord) - April 2024 - Equinix peering AS15830.

Wed, May 1, 10:02 AM · netops, Infrastructure-Foundations

cmooney removed a project from T363895: BGP status (instance cr2-eqord) - April 2024 - Equinix peering AS15830: sre-alert-triage.

Wed, May 1, 10:01 AM · netops, Infrastructure-Foundations

cmooney triaged T363895: BGP status (instance cr2-eqord) - April 2024 - Equinix peering AS15830 as Low priority.

These are direct peerings to Equinix tehmselves over their own exchange. We are waiting on them to complete the configuration of their side (see peering@wikimedia.org). I emailed last week to chase them for an update.

Wed, May 1, 9:59 AM · netops, Infrastructure-Foundations

Tue, Apr 30

cmooney edited P61499 (An Untitled Masterwork).

Tue, Apr 30, 10:09 PM

cmooney created P61499 (An Untitled Masterwork).

Tue, Apr 30, 10:08 PM

Mon, Apr 29

cmooney closed T362366: Inbound interface errors as Resolved.

Looks like this was a brief blip of inbound errors (unlike last time when they began and kept increasing until eventually the link failed).

Mon, Apr 29, 6:22 PM · SRE, ops-eqiad

cmooney created P61419 (An Untitled Masterwork).

Mon, Apr 29, 2:49 PM

cmooney added a comment to T363702: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain.

Actually it may be just easier to check the route for each pooled IP and make sure the check doesn't return saying it's using the default as per the task descr.

cmooney@lvs1019:~$ ip --json route get fibmatch 1.1.1.1 
[{"dst":"default","gateway":"10.64.32.1","dev":"eno1np0","flags":["onlink"]}]

Mon, Apr 29, 2:42 PM · Patch-For-Review, Data-Platform-SRE (2024.05.06 - 2024.05.26), Traffic

cmooney added a comment to T363702: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain.

Thanks Brian.

Mon, Apr 29, 2:39 PM · Patch-For-Review, Data-Platform-SRE (2024.05.06 - 2024.05.26), Traffic

cmooney added a comment to T362746: Upgrade s4 to MariaDB 10.6.

In T362746#9747411, @cmooney wrote:

In T362746#9745233, @ABran-WMF wrote:

good catch! at this point I think it's not a concern anymore. Unless it has further implications?

No, the fact it worked means we didn't hit the issue we have seen before with this, so no further action needed.

I note the 10G NIC model in this box is a BCM57412, I think perhaps the issue we seen before only applies to BCM57810, I'll maybe look more into that and discuss with DC-Ops but not relevant to this task. Thanks!

Mon, Apr 29, 11:18 AM · DBA

cmooney (Cathal Mooney)
SRE (netops)

Projects

Calendar

Today

Tomorrow

Thursday

User Details

Recent Activity
View All

Today

Yesterday

Fri, May 17

Thu, May 16

Wed, May 15

Tue, May 14

Mon, May 13

Fri, May 10

Thu, May 9

Wed, May 8

Fri, May 3

Thu, May 2

Wed, May 1

Tue, Apr 30

Mon, Apr 29

cmooney (Cathal Mooney)SRE (netops)

Projects

Calendar

Today

Tomorrow

Thursday

User Details

Recent ActivityView All

Today

Yesterday

Fri, May 17

Thu, May 16

Wed, May 15

Tue, May 14

Mon, May 13

Fri, May 10

Thu, May 9

Wed, May 8

Fri, May 3

Thu, May 2

Wed, May 1

Tue, Apr 30

Mon, Apr 29

cmooney (Cathal Mooney)
SRE (netops)

Recent Activity
View All