Page MenuHomePhabricator

ayounsi (Arzhel Younsi)
Staff Network SRE

Projects (10)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 3 2017, 6:23 PM (387 w, 5 d)
Availability
Busy Busy until Sep 16.
IRC Nick
xionox
LDAP User
Ayounsi
MediaWiki User
AYounsi (WMF) [ Global Accounts ]

Recent Activity

Fri, Aug 30

ayounsi added a comment to T310583: Netbox: use Journaling feature.

Documented on how to use it in scripts and pynetbox: https://wikitech.wikimedia.org/wiki/Netbox#Journaling

Fri, Aug 30, 3:12 PM · Infrastructure-Foundations, netbox

Thu, Aug 29

ayounsi added a comment to T237492: Create a second text-lb IP address for test purposes.

@ssingh yep, you can clean it up anytime, thanks !

Thu, Aug 29, 3:27 PM · Traffic-Icebox, SRE
ayounsi created T373587: Remove Additional IP records from procurement request template.
Thu, Aug 29, 10:37 AM · DC-Ops

Wed, Aug 28

ayounsi added a comment to T372161: Publish, and maintain ASPA records for valid AS14907 upstreams.

Ta da: https://wikitech.wikimedia.org/w/index.php?title=Adding_and_removing_transit_providers&diff=2218856&oldid=2042295. Can you verify this is correct? There are lots of references to private Phabricator tasks, and of course I have never dealed with WMF's transit providers before.

That's great and sound much more pro :)
I did some tiny changes

Wed, Aug 28, 7:18 AM · Infrastructure-Foundations, netops
ayounsi added a comment to T372158: Apply egress Source Address Validation on the Wikimedia core routers.

I'm wondering if we can have Homer populate a prefix list instead.

That's always an option, it of course comes down to how much more complex that makes it, and if the tradeoff is worth it

Wed, Aug 28, 6:33 AM · netops, Infrastructure-Foundations

Tue, Aug 27

ayounsi lowered the priority of T259182: micro-CI for homer-private from Medium to Low.
Tue, Aug 27, 2:15 PM · Infrastructure-Foundations, homer
ayounsi closed T348036: sre.hardware.upgrade-firmware cookbook: product slug parsing as Resolved.

Deployed! let me know if any issue.

Tue, Aug 27, 1:57 PM · Patch-For-Review, netbox, DC-Ops, SRE, Infrastructure-Foundations
ayounsi added a comment to T372909: Create prod VMs on routed ganeti cluster.

rpki2003 created, I used this opportunity to create it as a Bookworm VM.

Tue, Aug 27, 12:36 PM · Patch-For-Review, Infrastructure-Foundations
ayounsi closed T299560: Enable drbd collector on ganeti nodes as Resolved.

All done!

Tue, Aug 27, 12:24 PM · Patch-For-Review, Ganeti, observability, Infrastructure-Foundations, SRE
ayounsi added a comment to T299560: Enable drbd collector on ganeti nodes.

Draft dashboard: https://grafana.wikimedia.org/d/f_tZtVlMz/drbd

Tue, Aug 27, 9:07 AM · Patch-For-Review, Ganeti, observability, Infrastructure-Foundations, SRE
ayounsi added a comment to T299560: Enable drbd collector on ganeti nodes.

I manually added --collector.drbd to /etc/default/prometheus-node-exporter on one of the Routed Ganeti exporter

Tue, Aug 27, 6:58 AM · Patch-For-Review, Ganeti, observability, Infrastructure-Foundations, SRE

Mon, Aug 26

ayounsi claimed T299560: Enable drbd collector on ganeti nodes.
Mon, Aug 26, 3:14 PM · Patch-For-Review, Ganeti, observability, Infrastructure-Foundations, SRE
ayounsi closed T371890: pynetbox incompatibility with Netbox >= 4.0.6 as Resolved.

It's all good now, I guess we just had to wait a little bit.

Mon, Aug 26, 2:46 PM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi merged T373166: Alert in need of triage: Juniper alarms (instance cr1-eqiad) into T372781: cr1-eqiad: disk failure.
Mon, Aug 26, 2:27 PM · Infrastructure-Foundations, ops-eqiad, DC-Ops, netops
ayounsi merged task T373166: Alert in need of triage: Juniper alarms (instance cr1-eqiad) into T372781: cr1-eqiad: disk failure.
Mon, Aug 26, 2:27 PM · Infrastructure-Foundations, sre-alert-triage
ayounsi closed T339121: netbox: decided how to deal with blank mgmt dns_names as Resolved.

Validator deployed.

Mon, Aug 26, 12:15 PM · Patch-For-Review, SRE-tools, netbox, Infrastructure-Foundations
ayounsi added a comment to T348036: sre.hardware.upgrade-firmware cookbook: product slug parsing.

Deployed on netbox-next and tests seem all good.

Mon, Aug 26, 11:52 AM · Patch-For-Review, netbox, DC-Ops, SRE, Infrastructure-Foundations
ayounsi created T373323: Cleanup Netbox device-types.
Mon, Aug 26, 11:43 AM · DC-Ops
ayounsi added a comment to T368257: generate_vrts_aliases failing on mx-in1001.

It would be useful to know why it failed (maybe on the server's logs?), but +1 to adding a retry logic regardless.

Mon, Aug 26, 11:31 AM · Patch-For-Review, Infrastructure-Foundations, Mail, vrts, collaboration-services
ayounsi added a comment to T310589: Netbox: basic change rollback.

I had a try at this. See attached screenshot for using the "offline device" script, then the "revert" script using the request ID.

Mon, Aug 26, 9:08 AM · Patch-For-Review, netbox, Infrastructure-Foundations
ayounsi awarded T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets a 100 token.
Mon, Aug 26, 6:11 AM · Patch-For-Review, serviceops, SRE

Fri, Aug 23

ayounsi closed T330084: gdnsd failures when converting services from active/passive to active/active, a subtask of T234997: Make Netbox Active/Active, as Declined.
Fri, Aug 23, 11:23 AM · Infrastructure-Foundations, netbox, Traffic-Icebox, SRE
ayounsi closed T330084: gdnsd failures when converting services from active/passive to active/active as Declined.

Not going active/active (see T234997: Make Netbox Active/Active)

Fri, Aug 23, 11:23 AM · Traffic, Infrastructure-Foundations, netbox
ayounsi closed T309034: netbox: drop profile::netbox::active_server parameter as Declined.

The active server parameter now control rq-netbox as well, so it's unlikely we get rid of it (see T341843: Netbox rq.timeouts.JobTimeoutException)
As we're not going to the active/active direction anytime soon (see T234997: Make Netbox Active/Active) I'm going to close this task in favor of T330883: Improve Netbox active/passive failover process

Fri, Aug 23, 10:25 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi renamed T330883: Improve Netbox active/passive failover process from Netbox in codfw slowness issue to Improve Netbox active/passive failover process.
Fri, Aug 23, 10:17 AM · netbox, Infrastructure-Foundations
ayounsi added a comment to T330883: Improve Netbox active/passive failover process.

We should focus our efforts on improving the active/passive failover process.

Fri, Aug 23, 10:17 AM · netbox, Infrastructure-Foundations
ayounsi closed T234997: Make Netbox Active/Active as Declined.

An active/active Netbox is not really doable for now. For both Redis and Postgres the extra cross-DC latency makes it practically unusable (see T341843: Netbox rq.timeouts.JobTimeoutException and T330883: Improve Netbox active/passive failover process. And it doesn't seems doable for not to even just split reads and writes.
We should focus our efforts on improving the active/passive failover process.

Fri, Aug 23, 10:14 AM · Infrastructure-Foundations, netbox, Traffic-Icebox, SRE
ayounsi closed T234997: Make Netbox Active/Active, a subtask of T296452: Upgrade Netbox to 3.2, as Declined.
Fri, Aug 23, 10:12 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi changed the status of T309034: netbox: drop profile::netbox::active_server parameter from Open to Stalled.
Fri, Aug 23, 10:01 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi moved T371890: pynetbox incompatibility with Netbox >= 4.0.6 from Backlog to Work in Progress / Tasks to Do on the netbox board.
Fri, Aug 23, 9:59 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi moved T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible from Backlog to Discussion / Design / Consensus Making on the netbox board.
Fri, Aug 23, 9:43 AM · Infrastructure-Foundations, netbox
ayounsi moved T271577: Make DNS generation faster from Discussion / Design / Consensus Making to Backlog on the netbox board.
Fri, Aug 23, 9:43 AM · Infrastructure-Foundations, User-crusnov, netbox
ayounsi assigned T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible to akosiaris.

Upgrade is done, I had a bit more time to look into that.

Fri, Aug 23, 9:42 AM · Infrastructure-Foundations, netbox
ayounsi moved T303170: puppet lookup causes spurious puppetdb entries from Backlog to Watching on the netbox board.
Fri, Aug 23, 6:34 AM · Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, netbox
ayounsi added a comment to T372783: Verify that cephosd* server reimages work without adversely affecting cluster availability.

https://netbox.wikimedia.org/extras/scripts/results/82927/

cephosd1005 (WMF10631) Device is Active in Netbox but is missing from PuppetDB (should be ('decommissioning', 'inventory', 'offline', 'planned', 'staged', 'failed'))

Fri, Aug 23, 6:19 AM · Patch-For-Review, Data-Platform-SRE (2024.08.17 - 2024.09.06)

Thu, Aug 22

ayounsi closed T319301: Netbox: manage VRRP priorities as Declined.
Thu, Aug 22, 3:00 PM · Infrastructure-Foundations, netbox
ayounsi closed T278936: Netbox: import from PuppetDB script creates VIP also if exists as Declined.

The changelog links have expired. I tried to reproduce the issue with other similar hosts (gitlab, gerrit, etc) but couldn't.

Thu, Aug 22, 2:56 PM · Infrastructure-Foundations, netbox
ayounsi moved T263429: Netbox support for svc allocation from Work in Progress / Tasks to Do to Backlog on the netbox board.
Thu, Aug 22, 2:50 PM · Infrastructure-Foundations, SRE-tools, netbox
ayounsi added a comment to T339121: netbox: decided how to deal with blank mgmt dns_names.

Added a relevant check in the IP validator.

Thu, Aug 22, 2:48 PM · Patch-For-Review, SRE-tools, netbox, Infrastructure-Foundations
ayounsi moved T339121: netbox: decided how to deal with blank mgmt dns_names from Backlog to Work in Progress / Tasks to Do on the netbox board.
Thu, Aug 22, 2:47 PM · Patch-For-Review, SRE-tools, netbox, Infrastructure-Foundations
ayounsi claimed T339121: netbox: decided how to deal with blank mgmt dns_names.
Thu, Aug 22, 2:47 PM · Patch-For-Review, SRE-tools, netbox, Infrastructure-Foundations
ayounsi closed T340190: Netbox: PuppetDB import script error with VMs as Resolved.

Probably safe to close as it has been more than a year.

Thu, Aug 22, 1:36 PM · Infrastructure-Foundations, netbox
ayounsi closed T371036: Netbox logs filling up disk, netbox1002 as Resolved.

netbox1002 is gone :) Netbox 4 servers have bigger disks and getstats (which was generating them) has been replaced by a plugin.

Thu, Aug 22, 1:34 PM · Infrastructure-Foundations, netbox
ayounsi moved T360596: Figure out a plan to move forward with regarding Redis License changes from Backlog to Watching on the netbox board.
Thu, Aug 22, 1:33 PM · GitLab (Infrastructure), Patch-For-Review, User-aborrero, serviceops, MediaWiki-Platform-Team (Radar), collaboration-services, Release-Engineering-Team (Radar), Quarry, Toolforge, Software-Licensing, Infrastructure-Foundations, netbox, Core Platform Team Initiatives (API Gateway), ChangeProp, MediaWiki-File-management, SRE
ayounsi moved T348036: sre.hardware.upgrade-firmware cookbook: product slug parsing from Backlog to Work in Progress / Tasks to Do on the netbox board.
Thu, Aug 22, 1:30 PM · Patch-For-Review, netbox, DC-Ops, SRE, Infrastructure-Foundations
ayounsi claimed T348036: sre.hardware.upgrade-firmware cookbook: product slug parsing.

Taking the task to create the validator

Thu, Aug 22, 1:29 PM · Patch-For-Review, netbox, DC-Ops, SRE, Infrastructure-Foundations
ayounsi added a comment to T351950: taavi's netbox-next account is stuck.

@taavi are you still having issues here?

Thu, Aug 22, 12:26 PM · Infrastructure-Foundations, netbox
ayounsi claimed T372781: cr1-eqiad: disk failure.
Thu, Aug 22, 10:31 AM · Infrastructure-Foundations, ops-eqiad, DC-Ops, netops
ayounsi added a comment to T372781: cr1-eqiad: disk failure.

Disk is gone :

show vmhost hardware re0
re0:
[...]
    Item       Capacity             Part number               Serial number             Description
    DIMM 0     16384 MB             VL33A2G60F-N6SB-JUN       0x49689191                DDR4 2133 MHz
    DIMM 1     16384 MB             VL33A2G60F-N6SB-JUN       0x49689445                DDR4 2133 MHz
    DIMM 2     16384 MB             VL33A2G60F-N6SB-JUN       0x49689440                DDR4 2133 MHz
    DIMM 3     16384 MB             VL33A2G60F-N6SB-JUN       0x49689134                DDR4 2133 MHz
    Disk1      50.0  GB             SFSA050GV3AA2TO           000060158505A1000227      SLIM SATA SSD
Thu, Aug 22, 10:29 AM · Infrastructure-Foundations, ops-eqiad, DC-Ops, netops
ayounsi added a comment to T370018: gitlab2002: wrong network for public IPV4 and IPV6.

Jelto made me aware of that task.
I cleared the report's error by de-attaching the IP from the interface in Netbox, so it matches what we currently have configured for Gerrit's IP as well. That way we can start having report error notification again.
As they're tagged as VIPs, they won't automatically be imported from Puppet.

Thu, Aug 22, 9:53 AM · collaboration-services, Infrastructure-Foundations, SRE

Wed, Aug 21

Dzahn awarded T371653: New hosts with "Netbox status: unknown" a Like token.
Wed, Aug 21, 3:28 PM · netbox, Patch-For-Review, Infrastructure-Foundations
cmooney awarded T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts a Love token.
Wed, Aug 21, 2:44 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
ayounsi closed T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts as Resolved.

All done !

Wed, Aug 21, 2:03 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
ayounsi added a comment to T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts.

Confirmed that cr1-eqiad stopped generating those logs for 10.64.0.82 (prometheus1005). The other one will happen anytime puppet picks up the changes.

Wed, Aug 21, 1:54 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
ayounsi closed T371653: New hosts with "Netbox status: unknown" as Resolved.

Updated pynetbox package has been pushed to cumin hosts to unblock the situation.

Wed, Aug 21, 12:56 PM · netbox, Patch-For-Review, Infrastructure-Foundations
ayounsi lowered the priority of T371890: pynetbox incompatibility with Netbox >= 4.0.6 from High to Medium.

I build a pynetbox 7.4.0 using the new pipeline : https://gitlab.wikimedia.org/repos/sre/pynetbox
https://gitlab.wikimedia.org/repos/sre/pynetbox/-/jobs/348632/artifacts/browse/WMF_BUILD_DIR/

Wed, Aug 21, 12:54 PM · Patch-For-Review, Infrastructure-Foundations, netbox

Tue, Aug 20

ayounsi triaged T372909: Create prod VMs on routed ganeti cluster as Medium priority.
Tue, Aug 20, 3:46 PM · Patch-For-Review, Infrastructure-Foundations
ayounsi added a comment to T372418: Put the alert1002 and alert2002 hosts in production.

I have added them to our PFW config and created T372520 for the deployment.

Deployed.

Tue, Aug 20, 3:15 PM · Patch-For-Review, SRE Observability (FY2024/2025-Q1), Observability-Alerting
ayounsi added a comment to T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets.

I need to check that the physical cabling changes are ok before we start

Physical cabling is on the new switches for rows A and B. Old switches are offline.

Tue, Aug 20, 12:20 PM · Patch-For-Review, serviceops, SRE
ayounsi closed T372728: puppetdb2003 os-updates-report failure as Resolved.
Tue, Aug 20, 6:39 AM · Patch-For-Review, Infrastructure-Foundations, Puppet-Infrastructure

Mon, Aug 19

ayounsi triaged T372782: asw-c7-codfw: PEM 0 is not powered as High priority.
Mon, Aug 19, 2:48 PM · DC-Ops, ops-codfw
ayounsi triaged T372781: cr1-eqiad: disk failure as High priority.
Mon, Aug 19, 2:47 PM · Infrastructure-Foundations, ops-eqiad, DC-Ops, netops
ayounsi closed T372248: Alert in need of triage: BGP status (instance cr1-esams) as Resolved.

Peer removed.

Mon, Aug 19, 2:43 PM · Infrastructure-Foundations, netops, sre-alert-triage
ayounsi triaged T372248: Alert in need of triage: BGP status (instance cr1-esams) as Low priority.
Mon, Aug 19, 2:22 PM · Infrastructure-Foundations, netops, sre-alert-triage
ayounsi closed T310590: Netbox: use Custom Model Validation as Resolved.

All tested and deployed.

Mon, Aug 19, 11:29 AM · Infrastructure-Foundations, netbox
ayounsi closed T310590: Netbox: use Custom Model Validation, a subtask of T306809: Enforce Netbox domain names without period termination, as Resolved.
Mon, Aug 19, 11:29 AM · DNS, netbox, SRE-tools, Infrastructure-Foundations
ayounsi updated subscribers of T369743: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304.

@Jclark-ctr @Clement_Goubert fixed the typo, so you should be good to go.

Mon, Aug 19, 11:26 AM · SRE, serviceops, ops-eqiad, DC-Ops
ayounsi updated subscribers of T372418: Put the alert1002 and alert2002 hosts in production.

Ditto for fundraising firewalls, they do have the alert hosts IP addresses in them, though I'm not sure how to add/remove them? cc @ayounsi for this and the homer point above

Homer is generated automatically from Netbox data using the capirca script: https://netbox.wikimedia.org/extras/scripts/1/jobs/ then a homer run is needed to actually update the network devices from using that data.
For the fundraising hosts/network you need to ping @Jgreen and @Dwisehaupt

Mon, Aug 19, 8:53 AM · Patch-For-Review, SRE Observability (FY2024/2025-Q1), Observability-Alerting
ayounsi added a comment to T372728: puppetdb2003 os-updates-report failure.

I didn't get a screenshot of the "before" but here is the "after" the linked patch:

Screenshot 2024-08-19 at 09-49-47 OS deprecation report for bullseye.png (208×860 px, 41 KB)

Mon, Aug 19, 7:59 AM · Patch-For-Review, Infrastructure-Foundations, Puppet-Infrastructure
ayounsi created T372728: puppetdb2003 os-updates-report failure.
Mon, Aug 19, 7:54 AM · Patch-For-Review, Infrastructure-Foundations, Puppet-Infrastructure
ayounsi closed T341843: Netbox rq.timeouts.JobTimeoutException as Resolved.

After discussions we decided to go down the first path of the list. I couldn't replicate the issue since deploying the workaround/fix. Please reopen if you see the timeout again or would like to fix the root of the issue.

Mon, Aug 19, 7:09 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox

Fri, Aug 16

ayounsi triaged T372654: Netbox ProvisionServer script fails vlan verification as High priority.
Fri, Aug 16, 3:49 PM · Patch-For-Review, Infrastructure-Foundations, netops
ayounsi closed T311052: Netbox: replace getstats.GetDeviceStats with netbox-more-metrics as Resolved.

All done.

Fri, Aug 16, 2:05 PM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi added a comment to T341843: Netbox rq.timeouts.JobTimeoutException.
Job dispatched to netbox1003 - takes less than 2min
Aug 16 09:24:25 netbox1003 python[1079619]: 09:24:25 default: extras.scripts.run_script(commit=True, data={}, job=<Job: 61e3e2e3-59b1-4982-81a7-5589221d7ed9>, request=<utilities.request.NetBoxFakeRequest ob>
Aug 16 09:25:47 netbox1003 python[1622900]: 09:25:47 default: Job OK (61e3e2e3-59b1-4982-81a7-5589221d7ed9)
Fri, Aug 16, 9:55 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox

Wed, Aug 14

ayounsi triaged T372461: Remove Additional IP records from procurement request template as Low priority.
Wed, Aug 14, 10:56 AM · Patch-For-Review, DC-Ops
ayounsi added a comment to T229542: Export LibreNMS data to Prometheus.

Sounds good to me !

Wed, Aug 14, 8:19 AM · Observability-Metrics
ayounsi added a comment to P67286 nbshell script to find clusters that should be removed from the NO_V6_DEVICE_NAME_PREFIXES variable in the Netbox network report.
output
>>> pprint.pprint(results)
defaultdict(<class 'int'>,
            {'an-redacteddb': 1,
             'clouddb': 9,
             'db': 251,
             'dbprov': 6,
             'dbproxy': 18,
             'dbstore': 3,
             'dumpsdata': 1,
             'es': 42,
             'ganeti': 29,
             'maps': 12,
             'ms-be': 16,
             'pc': 14,
             'restbase': 30,
             'snapshot': 1,
             'thanos-fe': 6,
             'wdqs': 5})
>>> 
>>> print(set(NO_V6_DEVICE_NAME_PREFIXES) - set(results.keys()))
{'thumbor', 'mc-gp', 'restbase-dev', 'mc', 'wtp', 'mwlog', 'mw', 'parse', 'graphite', 'ores', 'sessionstore'}
Wed, Aug 14, 7:26 AM · netbox
ayounsi created P67286 nbshell script to find clusters that should be removed from the NO_V6_DEVICE_NAME_PREFIXES variable in the Netbox network report.
Wed, Aug 14, 7:25 AM · netbox
ayounsi renamed T372453: snapshot1010, dumpsdata1003: add AAAA DNS record from snapshot1010: add AAAA DNS record to snapshot1010, dumpsdata1003: add AAAA DNS record.
Wed, Aug 14, 6:58 AM · Patch-For-Review, Data-Platform-SRE (2024.08.17 - 2024.09.06)
ayounsi triaged T372453: snapshot1010, dumpsdata1003: add AAAA DNS record as Low priority.
Wed, Aug 14, 6:52 AM · Patch-For-Review, Data-Platform-SRE (2024.08.17 - 2024.09.06)

Tue, Aug 13

ayounsi added a comment to T371890: pynetbox incompatibility with Netbox >= 4.0.6.

With the patches above, the last urgent-ish thing needed is to package the new pynetbox for the cumin hosts (bullseye)

Tue, Aug 13, 9:30 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi reopened T363341: Q4:rack/setup/install cloudcephosd10[39-41] as "Open".

https://netbox.wikimedia.org/extras/scripts/results/78992/
cloudcephosd1039 (WMF11571) /dcim/devices/5296/ Primary IPv6 missing DNS name
I guess the skip IPv6 box got checked by mistake, could someone add the host's FQDN to https://netbox.wikimedia.org/ipam/ip-addresses/17171/ (similar to https://netbox.wikimedia.org/ipam/ip-addresses/17159/) then run the sre.dns.netbox cookbook ?

Tue, Aug 13, 8:09 AM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops

Mon, Aug 12

ayounsi updated the task description for T310590: Netbox: use Custom Model Validation.
Mon, Aug 12, 4:25 PM · Infrastructure-Foundations, netbox
ayounsi claimed T372248: Alert in need of triage: BGP status (instance cr1-esams).

Emailed AS54994 and cleared the errors for the others.

Mon, Aug 12, 9:51 AM · Infrastructure-Foundations, netops, sre-alert-triage
ayounsi added a comment to T372161: Publish, and maintain ASPA records for valid AS14907 upstreams.

Nevertheless, it should be possible to publish ASPA records in RPKI through the ARIN portal

I looked a bit around Arin's RPKI's portal but couldn't find it, is there doc about it ?

Mon, Aug 12, 9:08 AM · Infrastructure-Foundations, netops
ayounsi added a comment to T372158: Apply egress Source Address Validation on the Wikimedia core routers.

However, in reality, it should be possible to reject all IP packets where the source IP is not part of the IP prefixes that the Foundation has been assigned (i.e. prefix lists production{4,6}, which are a superset of the publicly routable LVS service IPs).

We would need to at least permit traffic from the transit interface IPs, as they do BGP to their peers, v6 link local for neighbor discovery, some land GRE tunnels, etc. Not sure what is the cleanest way for that, maybe using an apply-path like for bgp-sessions ? Ideally we wouldn't have to list them all :)

Mon, Aug 12, 8:54 AM · Infrastructure-Foundations, netops

Aug 8 2024

ayounsi closed T371957: Decom Netbox 3 servers as Resolved.

Cleaned up.

Aug 8 2024, 1:17 PM · Infrastructure-Foundations, netbox
ayounsi added a comment to T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts.

This went very well until it didn't. Changes fully rolled back.

Aug 8 2024, 12:37 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
ayounsi added a comment to T371653: New hosts with "Netbox status: unknown".

Thanks, it's fixed for those 2 hosts.

Aug 8 2024, 8:31 AM · netbox, Patch-For-Review, Infrastructure-Foundations
ayounsi added a comment to T371890: pynetbox incompatibility with Netbox >= 4.0.6.

Fix released : https://github.com/netbox-community/pynetbox/releases/tag/v7.4.0

Aug 8 2024, 7:26 AM · Patch-For-Review, Infrastructure-Foundations, netbox

Aug 7 2024

ayounsi removed projects from T268621: Move some of wikimediacloud.org 185.15.56.0/23 to Netbox: Infrastructure-Foundations, SRE, netbox.
Aug 7 2024, 2:01 PM · cloud-services-team, DNS
ayounsi claimed T310590: Netbox: use Custom Model Validation.

Sent the patches for the last few ones left in the task description.

Aug 7 2024, 1:39 PM · Infrastructure-Foundations, netbox
ayounsi triaged T371957: Decom Netbox 3 servers as Low priority.
Aug 7 2024, 9:43 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x as Resolved.
Aug 7 2024, 9:39 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi added a comment to T336275: Upgrade Netbox to 4.x.

Notes from the Debrief meeting

Aug 7 2024, 9:38 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T327643: Add network devices fingerprints to known_hosts, as Resolved.
Aug 7 2024, 9:38 AM · SRE, SRE-tools, netops, Infrastructure-Foundations
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T252747: Generate ssh_known_hosts for network devices, as Resolved.
Aug 7 2024, 9:38 AM · Infrastructure-Foundations, SRE-tools, SRE
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T341843: Netbox rq.timeouts.JobTimeoutException, as Resolved.
Aug 7 2024, 9:38 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts, as Resolved.
Aug 7 2024, 9:38 AM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T340444: Markdown bug in Netbox-next, as Resolved.
Aug 7 2024, 9:38 AM · Infrastructure-Foundations, netbox