Page MenuHomePhabricator

cmooney (Cathal Mooney)
SRE (netops)

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
May 10 2021, 3:25 PM (58 w, 6 d)
Availability
Available
IRC Nick
topranks
LDAP User
Cathal Mooney
MediaWiki User
Cathalmooney100 [ Global Accounts ]

Recent Activity

Tue, Jun 21

cmooney added a comment to P29940 getting interface twice in NB.

for interface in (Interface.objects.filter(devicedevice_roleslug="server")

... .annotate(Count('ip_addresses'))
... .filter(ip_addresses_countgte=1)
... .exclude(cable
isnull=True)):
... print(interface.id)
...
Traceback (most recent call last):

File "<console>", line 2, in <module>

NameError: name 'Count' is not defined

Tue, Jun 21, 4:41 PM
cmooney created P29940 getting interface twice in NB.
Tue, Jun 21, 4:19 PM

Fri, Jun 17

cmooney triaged T310901: Complete testing of SONiC NOS / Dell network gear and write up as Low priority.
Fri, Jun 17, 4:13 PM · SRE, Infrastructure-Foundations, netops

Thu, Jun 16

Oisxela awarded P18238 AS 31500 AMS-IX Issue Affected Networks / Prefixes a Like token.
Thu, Jun 16, 6:22 PM

Wed, Jun 15

cmooney triaged T310715: Move interface VRF assignment to Netbox as Low priority.
Wed, Jun 15, 3:17 PM · SRE, Infrastructure-Foundations, netops

Fri, Jun 10

cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

No problem @nskaggs I'm off today but I can put some more verbose instructions together next week and link them on the page.

Fri, Jun 10, 10:27 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

Thu, Jun 9

cmooney created P29606 (An Untitled Masterwork).
Thu, Jun 9, 6:20 PM
cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

@nskaggs anyone with access to Netbox and ability to run homer (which I believe should be most of SRE) should be able to do it.

Thu, Jun 9, 10:50 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

Wed, Jun 8

cmooney created P29557 (An Untitled Masterwork).
Wed, Jun 8, 5:29 PM

Thu, Jun 2

cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

The work here is largely complete, merging that last patch to add the new switches to monitoring should be the final step.

Thu, Jun 2, 1:08 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney added a comment to T299574: Q3:(Need By: TBD) rack/setup/install cloudvirt10[48-50].eqiad.wmnet.

One thing to note as it's not been mentioned in the task description is that the '--enable-virtualization' flag should be used when running the sre.hosts.provision cookbook against them (as they are OpenStack hypervisors).

Thu, Jun 2, 1:06 PM · cloud-services-team (Hardware), SRE, ops-eqiad, DC-Ops
cmooney updated Other Assignee for T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34], added: Cmjohnson.

@Cmjohnson hey are you able to take care of the BIOS / RAID setup for these hosts? All should be ready for normal deploy anyway, John said you were the one who normally did those steps. Thanks.

Thu, Jun 2, 9:58 AM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops

Wed, Jun 1

cmooney created P29321 cloudgw1002 routing via 185.15.56.243.
Wed, Jun 1, 2:31 PM

Tue, May 31

cmooney closed T305840: Cannot verify NTP status asw1-b12-drmrs as Resolved.

After a bit of back-and-forth with Juniper they eventually suggests just killing the ntpd process from a root shell.

Tue, May 31, 12:20 PM · SRE, Infrastructure-Foundations, netops
cmooney updated subscribers of T304888: Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts.

@nskaggs / @dcaro, just an observation I'd missed before on this task:

cloudnet1005 C8 U37 Cableid 20220119, 20220120 Port 1, 2 (cloudsw2-c8-eqiad)
cloudnet1006 D5 U38 Cableid 20220116, 20220117 Port 8, 9 (cloudsw2-d5-eqiad)

These servers are 100% in the right racks, C8 and D5, given their role in the overall cloud network. It would be better if they we connected directly to cloudsw1 in each of the racks though, not cloudsw2. It would work, but it means every outgoing packet in the cloud network will have to go from cloudsw1->cloudsw2->cloudsw1 as it routes via cloudnet.

Tue, May 31, 11:19 AM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney added a comment to T309524: DHCPd: update config to log more info.

I agree @jbond it would be useful to have more granular detail.

Tue, May 31, 11:05 AM · SRE, Infrastructure-Foundations, netops

Mon, May 30

cmooney created P28919 May 30 2022 Homer changes Eqiad row B and C.
Mon, May 30, 10:14 AM

May 27 2022

cmooney closed T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad as Resolved.

Work for this is now completed, will update design task once confirmed there are no niggles with reimaging.

May 27 2022, 12:59 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney closed T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad, a subtask of T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F, as Resolved.
May 27 2022, 12:58 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

May 26 2022

cmooney added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

@Jclark-ctr I'm not really able to progress this. I was gonna try one reimage but given the disk / RAID config needs to be done, and I'm unsure of the other BIOS/firmware stuff I backed out in case I messed any of that up.

May 26 2022, 10:10 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney added a comment to T307399: Q4: rack/setup/install stat1010.

These should be ok for rows E/F if that suits the team.

May 26 2022, 9:17 PM · SRE, Data-Engineering, ops-eqiad, DC-Ops
cmooney added a comment to T306835: Q4:(Need By: TBD) rack/setup/install an-presto10[06-15].eqiad.wmnet.

Should be good for rows E and F if that works for the team.

May 26 2022, 9:17 PM · Data-Engineering, SRE, ops-eqiad, DC-Ops
cmooney added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

@Jclark-ctr ok thanks for the clarification. I've only put the port details for 1025 and 1026 into Netbox so far, ports 21 and 22, so that's not changed which is good.

May 26 2022, 9:06 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney reassigned T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34] from Cmjohnson to Jclark-ctr.

@Cmjohnson apologies I assigned this to you in error (blind as a bat), I see @Jclark-ctr actually did the previous work on these so re-assigning.

May 26 2022, 8:48 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney created P28600 (An Untitled Masterwork).
May 26 2022, 3:47 PM
cmooney added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Quick update - I've been trying to image cloudcephosd1025 to make sure all is ok, and completed some operations.

May 26 2022, 2:42 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney updated the task description for T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].
May 26 2022, 2:37 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney added a comment to T303529: Agree how to handle port-block speeds for QFX5120-48Y.

@ayounsi I think based on the above we should proceed with https://gerrit.wikimedia.org/r/c/operations/software/homer/deploy/+/769729

May 26 2022, 11:02 AM · SRE, Patch-For-Review, Infrastructure-Foundations, netops

May 25 2022

cmooney committed rOSNEc0b51a7a7d06: Change order that Netbox server provision script gets old/new vlan name (authored by cmooney).
Change order that Netbox server provision script gets old/new vlan name
May 25 2022, 11:38 PM
cmooney created P28564 Netbox-dev interface-automation update failure..
May 25 2022, 11:37 PM
cmooney reassigned T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34] from cmooney to Cmjohnson.

@nskaggs I believe that to be the case yes. I've not been able to successfully reimage any of these though. I might be missing a step at this stage however.

May 25 2022, 10:37 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney reassigned T299574: Q3:(Need By: TBD) rack/setup/install cloudvirt10[48-50].eqiad.wmnet from cmooney to Cmjohnson.

@nskaggs I believe that to be the case yes. I've not been able to successfully reimage any of the cloudcephosd hosts that are also in a similar state though.

May 25 2022, 10:36 PM · cloud-services-team (Hardware), SRE, ops-eqiad, DC-Ops

May 24 2022

cmooney created P28461 Homer Juniper config deploy timeout to SRX firewalls.
May 24 2022, 7:55 PM
cmooney created P28460 DNS Cookbook output - missing include for netbox/b.0.e.f.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa.
May 24 2022, 6:50 PM
cmooney created P28433 Combined homer diffs with 'diff' not 'commit'.
May 24 2022, 10:06 AM
cmooney created P28429 CF drmrs commit.
May 24 2022, 9:25 AM
cmooney created P28426 Diff yesterday..
May 24 2022, 9:06 AM

May 23 2022

cmooney created P28333 WMCS domains forward / back.
May 23 2022, 1:56 PM

May 20 2022

cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

Just a brief update here.

May 20 2022, 5:58 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

May 19 2022

cmooney created P28146 Pings to durum1001 from cr2-eqiad.
May 19 2022, 2:53 PM
cmooney added a comment to T305126: Make more extensive use of Netbox custom fields.

Probably a good idea to add an "ASN" custom field on devices so we can document the AS numbers used for eBGP based L3 switches like in drmrs.

May 19 2022, 12:27 PM · Infrastructure-Foundations, netbox

May 18 2022

cmooney created P27958 VRF Patch additions cloudsw1-c8 / cloudsw1-d5.
May 18 2022, 3:17 PM

May 17 2022

cmooney edited P27866 Example diff - add VRF ints.
May 17 2022, 2:46 PM
cmooney created P27866 Example diff - add VRF ints.
May 17 2022, 2:44 PM

May 13 2022

cmooney created P27826 (An Untitled Masterwork).
May 13 2022, 1:25 PM
cmooney created P27825 cmooney: rake spdx:convert:new_files errors.
May 13 2022, 12:47 PM

May 12 2022

cmooney created P27820 Var inheritance with inlcuded file - Jinja2.
May 12 2022, 4:01 PM
cmooney added a comment to T305126: Make more extensive use of Netbox custom fields.

Two other things which we might want to add custom fields on interfaces:

May 12 2022, 9:05 AM · Infrastructure-Foundations, netbox
cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

@elukey yes I think that makes sense, no need to hold off on testing. Your suggested label naming makes sense so let's go with that.

May 12 2022, 8:08 AM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops

May 11 2022

cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

Even in the legacy setup (pre row e/f) adding new nodes requires manual error-prone gerrit changes like this one 35b0c9a4832068d08

May 11 2022, 3:04 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney created P27787 JunOS Dynamic Neighbors With Different Peer-AS.
May 11 2022, 2:01 PM

May 10 2022

cmooney added a comment to T304888: Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts.

@nskaggs yes the ones that require the Public Vlan are probably actually placed not in those dedicated WMCS racks, to leave as much room as possible for servers there that require the dedicated Vlans only available there. Thanks.

May 10 2022, 5:19 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

@dcaro thanks!

May 10 2022, 9:51 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

If there is any kind of anycast with the k8s prefixes (same prefix advertised from multiple locations), we should also prepend the AS once on the core routers to keep path lengths consistent across the infra.

May 10 2022, 9:15 AM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops

May 6 2022

cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

@dcaro (sorry to pick on you!)

May 6 2022, 5:31 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

May 5 2022

cmooney added a comment to T305570: Q4:(Need By: TBD) rack/setup/install aqs1016-aqs1021.

@Jclark-ctr Could you do me a favour and ping me before you kick off the re-image process for aqs1020/aqs1021?

May 5 2022, 2:44 AM · Cassandra, SRE, ops-eqiad, DC-Ops

May 4 2022

cmooney created P27508 (An Untitled Masterwork).
May 4 2022, 6:39 PM
cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

@dcaro not just yet. I believe the one change we will need to test here is adding a route on the cloud-storage interfaces.

May 4 2022, 2:45 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney added a comment to T288750: LVS in Analytics VLANs.

One small downside is about traffic flows, if I understand correctly, most clients are in the analytics vlan, so traffic will do something like:

May 4 2022, 11:37 AM · Data-Engineering
cmooney added a comment to T288750: LVS in Analytics VLANs.

Yeah if we don't expect much traffic it might be hard to justify dedicated hardware / option 2.

May 4 2022, 11:30 AM · Data-Engineering

Apr 29 2022

cmooney added a comment to T307026: decommission atlas-esams.

From the experience with the one in codfw I think the process is to delete and then re-add.

Apr 29 2022, 2:17 PM · netops, DC-Ops, SRE, ops-esams, Infrastructure-Foundations, decommission-hardware

Apr 28 2022

cmooney claimed T299574: Q3:(Need By: TBD) rack/setup/install cloudvirt10[48-50].eqiad.wmnet.

This requires the updated WMCS network design to be agreed / validated (T304989) after which we can quickly complete the actual device configuration (T304936). Once that is ready we can proceed with the server provisioning as normal.

Apr 28 2022, 6:46 PM · cloud-services-team (Hardware), SRE, ops-eqiad, DC-Ops
cmooney claimed T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

This requires the updated WMCS network design to be agreed / validated (T304989) after which we can quickly complete the actual device configuration (T304936). Once that is ready we can proceed with the server provisioning as normal.

Apr 28 2022, 6:46 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney added a comment to T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.

Just a quick update here, will provide a fuller update (incl. updated diagrams etc.) next week.

Apr 28 2022, 6:41 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney added a parent task for T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34]: T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad.
Apr 28 2022, 6:10 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
cmooney added a subtask for T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad: T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].
Apr 28 2022, 6:10 PM · Patch-For-Review, SRE, netops, Infrastructure-Foundations
cmooney added a subtask for T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F: T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad.
Apr 28 2022, 6:09 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney added a parent task for T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad: T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.
Apr 28 2022, 6:09 PM · Patch-For-Review, SRE, netops, Infrastructure-Foundations
cmooney removed a parent task for T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F: T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad.
Apr 28 2022, 6:09 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney removed a subtask for T304936: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad: T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.
Apr 28 2022, 6:09 PM · Patch-For-Review, SRE, netops, Infrastructure-Foundations

Apr 27 2022

cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

On reflection the above won't work if we're going to add the 'node-location' for all existing hosts, which I assume is the intent. So the selector should probably be as follows for the CR filter:

nodeSelector: !(wikimedia.org/node-location starts with 'lsw')
Apr 27 2022, 1:11 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney created P26713 K8s BGP virtual machines.
Apr 27 2022, 12:59 PM
cmooney created P26711 K8s host locations.
Apr 27 2022, 12:57 PM
cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

Thanks for the updates. Sounds like a good plan!

Apr 27 2022, 12:07 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney created P26707 Exit code for bridge show commands.
Apr 27 2022, 9:41 AM

Apr 26 2022

cmooney created P26600 Stupid blowfish.
Apr 26 2022, 6:53 PM
cmooney committed rOSHP5843c4cea17e: Release v0.4.1 (authored by cmooney).
Release v0.4.1
Apr 26 2022, 6:48 PM
cmooney committed rOSHP669f0a13bf48: Correct wmf-netbox plugin failure with patch panel front ports (authored by cmooney).
Correct wmf-netbox plugin failure with patch panel front ports
Apr 26 2022, 6:35 PM
cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

The above patch is working, however I'm not 100% the resulting config is what we need. Looking, for instance, at ml-serve1005, it has established BGP peering to the top-of-rack switch, but it is still trying (and failing) to connect to the CR routers:

Apr 26 2022, 5:57 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney created P26565 Without 'remove the need for fetch_device_circuits'.
Apr 26 2022, 4:12 PM
cmooney committed rOSHO1fedf7c83862: CHANGELOG: add changelogs for release v0.4.1 (authored by cmooney).
CHANGELOG: add changelogs for release v0.4.1
Apr 26 2022, 3:38 PM
cmooney committed rOSHPc032ed46209e: Release v0.4.1 (authored by cmooney).
Release v0.4.1
Apr 26 2022, 3:34 PM
cmooney created P26555 ml-serve1005 BGP Peers.
Apr 26 2022, 2:21 PM
cmooney created P26554 (An Untitled Masterwork).
Apr 26 2022, 1:35 PM
cmooney created P26552 (An Untitled Masterwork).
Apr 26 2022, 1:32 PM
cmooney created P26549 (An Untitled Masterwork).
Apr 26 2022, 12:51 PM
cmooney added a comment to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches.

@elukey thanks for the patch, certainly looks ok to me, if indeed it works in terms of the Calico config :)

Apr 26 2022, 12:18 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney created P26499 cr2-eqord looks up.
Apr 26 2022, 5:50 AM

Apr 21 2022

cmooney added projects to T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches: netops, Infrastructure-Foundations.
Apr 21 2022, 4:46 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney triaged T306649: Agree strategy for Kubernetes BGP peering to top-of-rack switches as Medium priority.
Apr 21 2022, 4:44 PM · Prod-Kubernetes, SRE, Infrastructure-Foundations, netops
cmooney added a comment to T293922: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet.

Note the IP addresses assigned to the servers need to be updated to match those vlans.

Apr 21 2022, 1:11 PM · Data-Engineering, SRE, ops-eqiad, DC-Ops

Apr 19 2022

cmooney added a comment to T306220: 2M 25G DAC testing.

@Jclark-ctr that's great.

Apr 19 2022, 5:19 PM · SRE, netops, ops-eqiad, Infrastructure-Foundations
cmooney added a comment to T303529: Agree how to handle port-block speeds for QFX5120-48Y.

So to confirm it the configuration detailed above does not work:

Apr 19 2022, 5:18 PM · SRE, Patch-For-Review, Infrastructure-Foundations, netops
cmooney committed rOSNE9c908aa57290: Update Netbox Move Server Script to Copy original Tagged Vlans (authored by cmooney).
Update Netbox Move Server Script to Copy original Tagged Vlans
Apr 19 2022, 2:33 PM
cmooney added a comment to P25289 MAC Address table for cloudvirt2001-dev 19-03-2022.

The learnt MACs correspond to the following internal TAP interfaces (connected to VMs) on the host:

Apr 19 2022, 8:58 AM
cmooney created P25289 MAC Address table for cloudvirt2001-dev 19-03-2022.
Apr 19 2022, 8:56 AM
cmooney added a comment to T294949: Q2:(Need By: TBD) rack/setup/install ml-serve100[5-8].

These hosts hit the ARP issue described in T306421, and have been offline following re-image until this morning:

Apr 19 2022, 8:56 AM · Patch-For-Review, SRE, Machine-Learning-Team, ops-eqiad, DC-Ops
cmooney edited P25274 Icinga alerts for ml-server1005 and elastic1089-1005.
Apr 19 2022, 8:18 AM
cmooney created P25274 Icinga alerts for ml-server1005 and elastic1089-1005.
Apr 19 2022, 8:06 AM

Apr 15 2022

cmooney added a comment to T306007: Avoid ghost hosts on the network.

Actually one really ugly thing you could do is to make the Jinja templates add "disabled" config for every _possible_ interface name.

Apr 15 2022, 10:23 AM · SRE, Infrastructure-Foundations, netbox, netops, DC-Ops