User Details
- User Since
- May 10 2021, 3:25 PM (154 w, 1 d)
- Availability
- Available
- IRC Nick
- topranks
- LDAP User
- Cathal Mooney
- MediaWiki User
- CMooney (WMF) [ Global Accounts ]
Today
Fri, Apr 19
@ssingh I've reserved the following addresses in Netbox for the LVS now, let me know if you need any more info or if I can help.
lvs7001: pri_int: vlan 711 - public1-b3-magru - 195.200.68.2/27
Thu, Apr 18
This one in particular down for almost a year and IPs are not responding to ARP/ND on the LAN. Peeringdb still lists them but I removed the session for now, if they get in touch we can set it up again.
I'll take a look and clear up what I can.
Wed, Apr 17
Perhaps one option would be to ignore the puppet patch to change drmrs and esams for now - but merge the Homer one and configure magru LVS in puppet to peer with both switches?
I believe the two patches above, once merged, will add the required redundancy. Following option 1 above, creating backup peerings from the LVS hosts to the switch in the other rack.
Tue, Apr 16
In terms of live migration and session re-establishment we should bear in mind that BIRD bgp sessions will use default BGP timers of 180/60 when talking to each other (instead of the 90/30 Juniper defaults).
FWIW I changed the key-exchange algo configured on mr1-eqsin to see if it would make any difference, from some brief searching the ec21159 one seems to use less cpu than dh group-exchange one we have now. Thus far hasn't made much impact but I'll recheck the graphs in a day or two.
cmooney@mr1-eqsin# show | compare [edit system services ssh] - key-exchange group-exchange-sha2; + key-exchange curve25519-sha256;
Mon, Apr 15
The ideal path would require upstream changes in Bird (especially for v6) - http://trubka.network.cz/pipermail/bird-users/2024-April/017580.html
In general I'm a fan of dynamic neighbors so happy to use it here on the Ganeti side.
@Papaul yeah I think if we want to go this route we can just set them up the same as we do the netgear or fs.com msw's.
Sun, Apr 14
Apologies I'd missed it at first - but there are errors ingress on our circuit from codfw to eqiad on the eqiad side:
Fri, Apr 5
This rule in filter table / input chain is allowing the traffic to Hound, but only from two spectific IPs:
6 15097 906K ACCEPT tcp -- * * 172.16.5.238 0.0.0.0/0 tcp dpt:3002 7 0 0 ACCEPT tcp -- * * 172.16.5.200 0.0.0.0/0 tcp dpt:3002
Thu, Apr 4
@ayounsi thanks for the patch! LGTM.
I added the above script so we can move this outside of the current provision script, and simplify the work towards replacing that one with one which picks attributes based on the host name/profile.
Fri, Mar 29
Thu, Mar 28
Tue, Mar 26
cmooney@cumin1002:~$ for i in {100..199}; do dig A "db2$i.codfw.wmnet" @10.3.0.1 | grep \Query\ time; done ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 8 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 8 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 8 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 8 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 4 msec cmooney@cumin1002:~$
cmooney@cumin1002:~$ for i in {100..199}; do dig A "db2$i.codfw.wmnet" @10.3.0.1 | grep \Query\ time; done | sort | uniq ;; Query time: 0 msec ;; Query time: 4 msec ;; Query time: 8 msec cmooney@cumin1002:~$
Mon, Mar 25
Mar 22 2024
Closing this one, I've made some notes on wikitech below about how to approach these for future rows.
Closing this task, everything now completed. For future rows we can base the plan on the steps outlined here:
I've decommed the link from asw-b-codfw to the two Spines now and removed from Netbox. DC-Ops you can proceed and remove the cabling any time, for reference:
Mar 21 2024
This should be ok from here on. Thanks for the task and troubleshooting!
Mar 19 2024
Mar 12 2024
To confirm all the new networks in codfw are within 10.192.0.0/16, which I believe should be ok in terms of grants.
Mar 11 2024
Thanks for confirming @KCVelaga_WMF. I’ll get that done over the next day or so; we have our annual SRE meet up this week but I should get a few minutes to make the changes.
Mar 8 2024
Thanks for looking at this @ayounsi. It was an oversight I'd not fully tested but yes everything was mostly ok apart from the log message trying to access .name. Patch should fix.
@fgiunchedi wondering if you'd any thoughts on the above suggestion to allow more series through from the gnmic pipeline?
This is likely what also happened with T358099, I'll have a look and see if I can work out where it's going wrong.
Error due to define virt_floating = ["185.15.56.0/25"]:
Mar 08 13:10:57 cloudgw1002 nft[42895]: In file included from /etc/nftables/main.nft:9:1-36: Mar 08 13:10:57 cloudgw1002 nft[42895]: /etc/nftables/110_cloudgw_puppet.nft:184:43-55: Error: unknown identifier 'virt_floating' Mar 08 13:10:57 cloudgw1002 nft[42895]: ip saddr { $virtual_subnet_cidr, $virt_floating } accept Mar 08 13:10:57 cloudgw1002 nft[42895]: ^^^^^^^^^^^^^
Mar 6 2024
@KCVelaga_WMF great news!
Mar 5 2024
Closing task. Big thanks to all the SRE teams for the help and co-operation getting this one over the line :)