Page MenuHomePhabricator

cmooney (Cathal Mooney)
SRE (netops)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
May 10 2021, 3:25 PM (153 w, 2 d)
Availability
Available
IRC Nick
topranks
LDAP User
Cathal Mooney
MediaWiki User
CMooney (WMF) [ Global Accounts ]

Recent Activity

Today

cmooney added a comment to T362772: ASW single-point of failure for LVS VIPs at POPs.

Perhaps one option would be to ignore the puppet patch to change drmrs and esams for now - but merge the Homer one and configure magru LVS in puppet to peer with both switches?

Wed, Apr 17, 7:02 PM · Patch-For-Review, Traffic, SRE, netops, Infrastructure-Foundations
cmooney added a comment to T362772: ASW single-point of failure for LVS VIPs at POPs.

I believe the two patches above, once merged, will add the required redundancy. Following option 1 above, creating backup peerings from the LVS hosts to the switch in the other rack.

Wed, Apr 17, 6:58 PM · Patch-For-Review, Traffic, SRE, netops, Infrastructure-Foundations
cmooney created P60806 (An Untitled Masterwork).
Wed, Apr 17, 5:14 PM
cmooney triaged T362772: ASW single-point of failure for LVS VIPs at POPs as Medium priority.
Wed, Apr 17, 11:55 AM · Patch-For-Review, Traffic, SRE, netops, Infrastructure-Foundations
cmooney added a comment to T362522: mr1-eqsin performance issue.

FWIW I changed the key-exchange algo configured on mr1-eqsin to see if it would make any difference

Wed, Apr 17, 9:32 AM · Infrastructure-Foundations, netops

Yesterday

cmooney added a comment to T362392: Routed Ganeti: Add support for VM BGP.

In terms of live migration and session re-establishment we should bear in mind that BIRD bgp sessions will use default BGP timers of 180/60 when talking to each other (instead of the 90/30 Juniper defaults).

Tue, Apr 16, 3:20 PM · Ganeti
cmooney updated subscribers of T347411: Drive host network config from Netbox, and move away from ifupdown.

Some things that may be possible, if still trying to predict the names from redfish data:

  1. Change the NamePolicy to 'mac', so we get names like enxc45ab1abd7f5. Ugly, but should be consistent?
Tue, Apr 16, 12:51 PM · User-aborrero, Infrastructure-Foundations, SRE
cmooney created P60614 (An Untitled Masterwork).
Tue, Apr 16, 12:04 PM
cmooney added a comment to T362522: mr1-eqsin performance issue.

FWIW I changed the key-exchange algo configured on mr1-eqsin to see if it would make any difference, from some brief searching the ec21159 one seems to use less cpu than dh group-exchange one we have now. Thus far hasn't made much impact but I'll recheck the graphs in a day or two.

cmooney@mr1-eqsin# show | compare 
[edit system services ssh]
-    key-exchange group-exchange-sha2;
+    key-exchange curve25519-sha256;
Tue, Apr 16, 11:31 AM · Infrastructure-Foundations, netops
cmooney edited P60606 kubemaster1002 port 6443 connections.
Tue, Apr 16, 10:41 AM
cmooney created P60606 kubemaster1002 port 6443 connections.
Tue, Apr 16, 10:39 AM

Mon, Apr 15

cmooney added a comment to T362522: mr1-eqsin performance issue.

it looks like the issue we are facing with the slowness on the device and the reboots is product of a brute force SSH attack on the SRX

Mon, Apr 15, 10:20 PM · Infrastructure-Foundations, netops
cmooney edited P60541 Ganeti-side config.
Mon, Apr 15, 10:15 PM
cmooney updated the title for P60542 vm side bird conf from untitled to vm side bird conf.
Mon, Apr 15, 10:07 PM
cmooney edited P60542 vm side bird conf.
Mon, Apr 15, 9:54 PM
cmooney created P60542 vm side bird conf.
Mon, Apr 15, 9:43 PM
cmooney created P60541 Ganeti-side config.
Mon, Apr 15, 9:41 PM
cmooney added a comment to T362392: Routed Ganeti: Add support for VM BGP.

The ideal path would require upstream changes in Bird (especially for v6) - http://trubka.network.cz/pipermail/bird-users/2024-April/017580.html

Mon, Apr 15, 8:52 PM · Ganeti
cmooney edited P60540 (An Untitled Masterwork).
Mon, Apr 15, 8:51 PM
cmooney created P60540 (An Untitled Masterwork).
Mon, Apr 15, 8:50 PM
cmooney added a comment to T362392: Routed Ganeti: Add support for VM BGP.

In general I'm a fan of dynamic neighbors so happy to use it here on the Ganeti side.

Mon, Apr 15, 8:24 PM · Ganeti
cmooney added a comment to T360772: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw.

We can define per host hiera keys, and empty lists as well, so to be tested but I don't think we need to implement a new feature

Mon, Apr 15, 6:18 PM · SRE, Infrastructure-Foundations, netops
cmooney claimed T362366: Inbound interface errors.
Mon, Apr 15, 4:38 PM · SRE, ops-eqiad
cmooney added a parent task for T362366: Inbound interface errors: Unknown Object (Task).
Mon, Apr 15, 4:01 PM · SRE, ops-eqiad
cmooney triaged T361871: codfw: use old asw switches from row A and B as msw switches in row C and D as Low priority.

@Papaul yeah I think if we want to go this route we can just set them up the same as we do the netgear or fs.com msw's.

Mon, Apr 15, 2:40 PM · SRE, netops, Infrastructure-Foundations, ops-codfw

Sun, Apr 14

cmooney added a parent task for T362482: x2 codfw master (db2144) TCP errors: Unknown Object (Task).
Sun, Apr 14, 11:54 AM · Patch-For-Review, netops, Infrastructure-Foundations, DBA
cmooney added a comment to T362482: x2 codfw master (db2144) TCP errors.

Apologies I'd missed it at first - but there are errors ingress on our circuit from codfw to eqiad on the eqiad side:

Sun, Apr 14, 11:45 AM · Patch-For-Review, netops, Infrastructure-Foundations, DBA
cmooney created P60480 (An Untitled Masterwork).
Sun, Apr 14, 11:36 AM

Fri, Apr 5

cmooney added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

All cp hosts in eqiad are in rows A, B, C, and D, so that does look worth trying out I guess! Can you remind when and if this transition was made in the recent past?

Fri, Apr 5, 2:38 PM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad
cmooney added a comment to T361871: codfw: use old asw switches from row A and B as msw switches in row C and D.

We first need to discuss if we want to start using managed switches for management switches (except the aggregation ones).
On the plus side it's convenient to have the extra visibility, but it adds a lots of management overhead to our automation, while I'm not sure we have the resources for that.

Fri, Apr 5, 11:11 AM · SRE, netops, Infrastructure-Foundations, ops-codfw
cmooney added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

Last maybe we could explore relying less on PXE, for example is it possible to pass the host and tftp server IP through redfish ?

Fri, Apr 5, 9:45 AM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad
cmooney added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

Any other opinions/thoughts on how we can try and fix this and where? I am very happy to do the legwork but kind of lost here on what to check next.

Fri, Apr 5, 9:36 AM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad
cmooney added a comment to T361899: Let codesearch-frontend reques to local Hound instances directly.

This rule in filter table / input chain is allowing the traffic to Hound, but only from two spectific IPs:

6    15097  906K ACCEPT     tcp  --  *      *       172.16.5.238         0.0.0.0/0            tcp dpt:3002
7        0     0 ACCEPT     tcp  --  *      *       172.16.5.200         0.0.0.0/0            tcp dpt:3002
Fri, Apr 5, 12:48 AM · Patch-For-Review, VPS-project-Codesearch

Thu, Apr 4

cmooney added a comment to T360297: Take advantage of 10Gb NICs in the new network stack.

@ayounsi thanks for the patch! LGTM.

Thu, Apr 4, 2:42 PM · Infrastructure-Foundations, DC-Ops, netops
cmooney updated subscribers of T358096: Automation to add extra IPs to servers.

I added the above script so we can move this outside of the current provision script, and simplify the work towards replacing that one with one which picks attributes based on the host name/profile.

Thu, Apr 4, 2:20 PM · Patch-For-Review, Infrastructure-Foundations

Fri, Mar 29

cmooney added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

It might sound revolutionary but I think mediawiki should not re-implement DNS. All of these IPs should be removed from dbctl and be done via a fast lookup in mw.

Fri, Mar 29, 10:53 AM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

Thu, Mar 28

cmooney edited P59003 (An Untitled Masterwork).
Thu, Mar 28, 6:31 PM
cmooney created P59003 (An Untitled Masterwork).
Thu, Mar 28, 6:29 PM

Tue, Mar 26

cmooney added a comment to P58918 (An Untitled Masterwork).
cmooney@cumin1002:~$ for i in {100..199}; do dig A "db2$i.codfw.wmnet" @10.3.0.1 | grep \Query\ time; done 
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 8 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 8 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 8 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 8 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 4 msec
cmooney@cumin1002:~$
cmooney@cumin1002:~$ for i in {100..199}; do dig A "db2$i.codfw.wmnet" @10.3.0.1 | grep \Query\ time; done | sort | uniq 
;; Query time: 0 msec
;; Query time: 4 msec
;; Query time: 8 msec
cmooney@cumin1002:~$
Tue, Mar 26, 11:37 AM
cmooney created P58918 (An Untitled Masterwork).
Tue, Mar 26, 11:34 AM

Mon, Mar 25

cmooney edited P58915 (An Untitled Masterwork).
Mon, Mar 25, 7:51 PM
cmooney created P58915 (An Untitled Masterwork).
Mon, Mar 25, 7:50 PM

Fri, Mar 22

cmooney removed a parent task for T345803: Connect two hosts in codfw row A/B for switch migration testing: T327938: Codfw row A/B top-of-rack switch refresh.
Fri, Mar 22, 4:43 PM · Infrastructure-Foundations, netops, ops-codfw, SRE
cmooney removed a subtask for T327938: Codfw row A/B top-of-rack switch refresh: T345803: Connect two hosts in codfw row A/B for switch migration testing.
Fri, Mar 22, 4:43 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T327938: Codfw row A/B top-of-rack switch refresh as Resolved.

Closing this one, I've made some notes on wikitech below about how to approach these for future rows.

Fri, Mar 22, 4:41 PM · netops, Infrastructure-Foundations, SRE
cmooney added a comment to T345803: Connect two hosts in codfw row A/B for switch migration testing.

@cmooney can we get those 2 hosts back in decom? Thanks

Fri, Mar 22, 4:40 PM · Infrastructure-Foundations, netops, ops-codfw, SRE
cmooney removed a parent task for T336485: Setup zero touch provisioning (ZTP) for network devices: T341670: Upgrade new codfw switches to Juniper recommended.
Fri, Mar 22, 4:28 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops, SRE-tools
cmooney removed a subtask for T341670: Upgrade new codfw switches to Juniper recommended: T336485: Setup zero touch provisioning (ZTP) for network devices.
Fri, Mar 22, 4:28 PM · SRE, netops, Infrastructure-Foundations, ops-codfw
cmooney closed T347191: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans as Resolved.

Closing this task, everything now completed. For future rows we can base the plan on the steps outlined here:

Fri, Mar 22, 4:28 PM · ops-codfw, Infrastructure-Foundations, SRE, netops
cmooney closed T347191: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans, a subtask of T327938: Codfw row A/B top-of-rack switch refresh, as Resolved.
Fri, Mar 22, 4:28 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T351534: Migrate IP gateway for private1-b-codfw to spine switches as Resolved.
Fri, Mar 22, 4:27 PM · netops, Infrastructure-Foundations, SRE
cmooney closed T351532: Migrate IP gateway for public1-a-codfw to spine switches as Resolved.
Fri, Mar 22, 4:26 PM · netops, Infrastructure-Foundations, SRE
cmooney updated the task description for T360776: Decom asw-b-codfw switch stack.
Fri, Mar 22, 4:02 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney added a comment to T360776: Decom asw-b-codfw switch stack.

I've decommed the link from asw-b-codfw to the two Spines now and removed from Netbox. DC-Ops you can proceed and remove the cabling any time, for reference:

Fri, Mar 22, 2:19 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney updated the task description for T360776: Decom asw-b-codfw switch stack.
Fri, Mar 22, 2:18 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney triaged T360776: Decom asw-b-codfw switch stack as Medium priority.
Fri, Mar 22, 1:33 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney updated the task description for T360772: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw.
Fri, Mar 22, 1:02 PM · SRE, Infrastructure-Foundations, netops
cmooney added a parent task for T360772: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw: T354869: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets.
Fri, Mar 22, 12:58 PM · SRE, Infrastructure-Foundations, netops
cmooney added a subtask for T354869: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets: T360772: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw.
Fri, Mar 22, 12:58 PM · netops, SRE, Infrastructure-Foundations
cmooney triaged T360772: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw as Low priority.
Fri, Mar 22, 12:58 PM · SRE, Infrastructure-Foundations, netops

Thu, Mar 21

cmooney updated the task description for T358260: Disable acceptance of IPv6 router-advertisement on non-default LVS interface.
Thu, Mar 21, 5:27 PM · Patch-For-Review, Traffic, SRE
cmooney updated the task description for T358260: Disable acceptance of IPv6 router-advertisement on non-default LVS interface.
Thu, Mar 21, 5:25 PM · Patch-For-Review, Traffic, SRE
cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.
Thu, Mar 21, 4:06 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney created P58870 (An Untitled Masterwork).
Thu, Mar 21, 1:58 PM
cmooney closed T359629: Netbox: bug preventing removing a parent bridge in custom script automation as Resolved.

This should be ok from here on. Thanks for the task and troubleshooting!

Thu, Mar 21, 11:13 AM · Infrastructure-Foundations, netops, netbox
cmooney created P58858 (An Untitled Masterwork).
Thu, Mar 21, 11:08 AM
cmooney added a comment to T358244: Decom asw-a-codfw switch stack.

FYI it's alerting for one of its PSU being down, but we don't really care anymore :

asw-a-codfw> show system alarms
1 alarms currently active
Alarm time Class Description
2024-03-16 09:20:23 UTC Major FPC 6 PEM 1 is not powered

I downtimed the stack in Icinga/LibreNMS for 1 month

Thu, Mar 21, 10:53 AM · netops, Infrastructure-Foundations, SRE, ops-codfw

Tue, Mar 19

cmooney committed rOSNE1f35e48f6e47: Fix error when removing an interface's bridge membership.
Fix error when removing an interface's bridge membership
Tue, Mar 19, 1:30 PM

Mar 12 2024

cmooney added a comment to T354878: Re-IP db servers in codfw row A/B moving to per-rack subnets.

To confirm all the new networks in codfw are within 10.192.0.0/16, which I believe should be ok in terms of grants.

Mar 12 2024, 4:40 PM · Data-Persistence, SRE, Infrastructure-Foundations

Mar 11 2024

cmooney added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Thanks for confirming @KCVelaga_WMF. I’ll get that done over the next day or so; we have our annual SRE meet up this week but I should get a few minutes to make the changes.

Mar 11 2024, 8:01 AM · SRE, SRE-Access-Requests

Mar 8 2024

cmooney added a comment to T359629: Netbox: bug preventing removing a parent bridge in custom script automation.

Thanks for looking at this @ayounsi. It was an oversight I'd not fully tested but yes everything was mostly ok apart from the log message trying to access .name. Patch should fix.

Mar 8 2024, 7:43 PM · Infrastructure-Foundations, netops, netbox
cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

Yeah having some ballpark numbers will be a great help @cmooney, unless we're talking hundreds of thousands more metrics than we have now I think we're good to go, tens of thousands we can do without much effort/resources

Mar 8 2024, 7:36 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

@fgiunchedi wondering if you'd any thoughts on the above suggestion to allow more series through from the gnmic pipeline?

Mar 8 2024, 1:56 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
cmooney claimed T359629: Netbox: bug preventing removing a parent bridge in custom script automation.

This is likely what also happened with T358099, I'll have a look and see if I can work out where it's going wrong.

Mar 8 2024, 1:21 PM · Infrastructure-Foundations, netops, netbox
cmooney added a comment to P58690 (An Untitled Masterwork).

Error due to define virt_floating = ["185.15.56.0/25"]:

Mar 08 13:10:57 cloudgw1002 nft[42895]: In file included from /etc/nftables/main.nft:9:1-36:
Mar 08 13:10:57 cloudgw1002 nft[42895]: /etc/nftables/110_cloudgw_puppet.nft:184:43-55: Error: unknown identifier 'virt_floating'
Mar 08 13:10:57 cloudgw1002 nft[42895]:         ip saddr { $virtual_subnet_cidr, $virt_floating } accept
Mar 08 13:10:57 cloudgw1002 nft[42895]:                                           ^^^^^^^^^^^^^
Mar 8 2024, 1:11 PM
cmooney created P58690 (An Untitled Masterwork).
Mar 8 2024, 1:08 PM
cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

@ayounsi finally got back to this for a closer look. Really great work, I tried to make a device-centric dashboard here also.

Mar 8 2024, 12:50 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

Mar 6 2024

cmooney lowered the priority of T354839: Support PyBal routes announced with lower priority than "backup" from Medium to Low.
Mar 6 2024, 6:32 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney updated the task description for T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.
Mar 6 2024, 11:54 AM · SRE, SRE-Access-Requests
cmooney added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

@cmooney I have moved over the files from stat1005:kcv-wikimf to stat1008:kcvelaga, and everything is working fine.

Mar 6 2024, 11:51 AM · SRE, SRE-Access-Requests
cmooney added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

@KCVelaga_WMF great news!

Mar 6 2024, 11:21 AM · SRE, SRE-Access-Requests

Mar 5 2024

cmooney closed T355544: Migrate hosts from codfw row A/B ASW to new LSW devices as Resolved.

Closing task. Big thanks to all the SRE teams for the help and co-operation getting this one over the line :)

Mar 5 2024, 11:35 PM · ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney removed a subtask for T355544: Migrate hosts from codfw row A/B ASW to new LSW devices: T358244: Decom asw-a-codfw switch stack.
Mar 5 2024, 11:35 PM · ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney removed a parent task for T358244: Decom asw-a-codfw switch stack: T355544: Migrate hosts from codfw row A/B ASW to new LSW devices.
Mar 5 2024, 11:35 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
cmooney closed T355873: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw, a subtask of T355544: Migrate hosts from codfw row A/B ASW to new LSW devices, as Resolved.
Mar 5 2024, 11:34 PM · ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney closed T355873: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw as Resolved.
Mar 5 2024, 11:34 PM · DBA, SRE, netops, Infrastructure-Foundations, ops-codfw
cmooney closed T352920: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan as Resolved.
Mar 5 2024, 7:06 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney closed T352920: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan, a subtask of T354869: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets, as Resolved.
Mar 5 2024, 7:06 PM · netops, SRE, Infrastructure-Foundations
cmooney added a comment to T352920: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan.

Reimage looks good, BGP up and lvs2011 handling traffic again:

cmooney@cumin1002:~$ sudo traceroute -I 208.80.153.224 
traceroute to 208.80.153.224 (208.80.153.224), 30 hops max, 60 byte packets
 1  ae4-1020.cr2-eqiad.wikimedia.org (10.64.48.3)  0.475 ms  0.444 ms  0.535 ms
 2  ae0.cr1-eqiad.wikimedia.org (208.80.154.193)  0.428 ms  0.532 ms  0.526 ms
 3  et-1-0-2.cr1-codfw.wikimedia.org (208.80.153.221)  30.613 ms  30.607 ms  30.601 ms
 4  irb-100.ssw1-a1-codfw.codfw.wmnet (10.192.254.5)  35.888 ms  35.882 ms  35.876 ms
 5  irb-2017.lsw1-a2-codfw.codfw.wmnet (10.192.0.106)  31.317 ms  31.309 ms  31.303 ms
 6  text-lb.codfw.wikimedia.org (208.80.153.224)  30.481 ms  30.302 ms  30.258 ms
Mar 5 2024, 6:31 PM · Traffic, netops, Infrastructure-Foundations, SRE
cmooney added a comment to T359198: Icinga BFD check failing.

Looks like it's:

man 1 download-mibs
download-mibs --help

and the config is at /etc/snmp-mibs-downloader/snmp-mibs-downloader.conf which has some kind of "AUTOLOAD" config line, fwiw.

Mar 5 2024, 5:55 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, netops, SRE
cmooney added a comment to T359198: Icinga BFD check failing.

I guess the snmp-mibs-downloader just has to be automated to download stuff?

Mar 5 2024, 5:32 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, netops, SRE
cmooney triaged T359198: Icinga BFD check failing as Medium priority.
Mar 5 2024, 5:09 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, netops, SRE
cmooney added a comment to T355873: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw.

All links moved without problem, servers back online and responding to ping now.

Mar 5 2024, 4:13 PM · DBA, SRE, netops, Infrastructure-Foundations, ops-codfw
cmooney added a comment to T358727: Reclaim recently-decommed CP host for WDQS (see T352253).

@VRiley-WMF any pointers on how to iDRAC / iLO to this node and establish with a hostname of wdqs1025.eqiad.wmnet? I'm wondering if maybe there's a direct IP or IPs given that there don't seem to be DNS records for cp1086.eqiad.wmnet or cp1086.mgmt.eqiad.wmnet?

Mar 5 2024, 2:43 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.03.04 - 2024.03.24), Wikidata, wmde-wikidata-tech, SRE, ops-eqiad
cmooney added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Taavi advised on IRC about the gerrit issue:

Mar 5 2024, 1:29 PM · SRE, SRE-Access-Requests
cmooney added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Yes, approved

Mar 5 2024, 11:34 AM · SRE, SRE-Access-Requests

Mar 4 2024

cmooney updated the task description for T355544: Migrate hosts from codfw row A/B ASW to new LSW devices.
Mar 4 2024, 4:20 PM · ops-codfw, Infrastructure-Foundations, netops, SRE
cmooney created P58364 (An Untitled Masterwork).
Mar 4 2024, 4:04 PM
cmooney updated subscribers of T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

@KCVelaga_WMF : That is expected, your kcvelaga account isn't yet part of the cn=wmf LDAP group, it will be when https://gerrit.wikimedia.org/r/1008450 is merged

Mar 4 2024, 2:53 PM · SRE, SRE-Access-Requests
cmooney added a comment to T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps.

Thanks for the task!

Mar 4 2024, 2:32 PM · Infrastructure-Foundations, SRE, Traffic