Page MenuHomePhabricator

Volans (Riccardo Coccioli)
SRE

Today

  • No visible events.

Tomorrow

  • No visible events.

Wednesday

  • No visible events.

User Details

User Since
Feb 10 2016, 11:25 AM (513 w, 4 d)
Availability
Available
IRC Nick
volans
LDAP User
Volans
MediaWiki User
RCoccioli (WMF) [ Global Accounts ]

Recent Activity

Thu, Nov 27

Volans added a comment to T253986: update bacula-sd config so that it listens on IPv6.

If you want to bind any address, from the docs at [1] it seems that you can just omit the setting and not specify any of SDAddresses and SDAddress. SDPort seems optional as we're using the default port.

Thu, Nov 27, 1:30 PM · Data-Persistence-Backup, SRE, IPv6

Tue, Nov 25

Volans added a comment to T337422: Add default Cloud VPS project alerts for low disk space and low inode count.

The problem is not the space, that disk is out of inodes:

# df -hi /srv/
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/sda         5.0M  5.0M     0  100% /srv
Tue, Nov 25, 8:22 PM · cloud-services-team, Cloud-VPS

Fri, Nov 21

Volans added a comment to T409857: [toolsdb] Automatically terminate long transactions.

FYI in production there are already mechanism to automatically kill queries, so we might just be able to reuse them.

Fri, Nov 21, 6:36 PM · cloud-services-team, Toolforge
Volans added a comment to T410426: Requesting access to analytics-privatedata-users for dsmit.

@DSmit-WMF did you had a chance to go through the documentation to clarify which access do you need to the analytics-privatedata-users group?
Perhaps just level 1 is already enough for you?

Fri, Nov 21, 5:48 PM · SRE, SRE-Access-Requests
Volans added a project to T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE: Data-Engineering.

@Milimetric / @Ahoelzl by any chance one of you could review this task for approval?

Fri, Nov 21, 5:13 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans added a comment to T410612: Requesting access to ops for blake.

@MoritzMuehlenhoff yes but I think Kwaku is covering for Kavitha while she's away in the next few days.

Fri, Nov 21, 10:54 AM · SRE, SRE-Access-Requests

Thu, Nov 20

Volans added a comment to T410572: Replace deprecated Phabricator Conduit API call by @ProdPasteBot with its stable equivalent.

Adding collaboration-services

Thu, Nov 20, 5:45 PM · collaboration-services, Phabricator
Volans updated the task description for T410612: Requesting access to ops for blake.
Thu, Nov 20, 11:48 AM · SRE, SRE-Access-Requests
Volans moved T410612: Requesting access to ops for blake from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Thu, Nov 20, 11:47 AM · SRE, SRE-Access-Requests

Wed, Nov 19

Volans added a comment to T410537: Add a --rack flag to sre.k8s.pool-depool-node.

Just for context referencing past ideas on the topic: T327300

Wed, Nov 19, 9:59 PM · Infrastructure-Foundations, SRE-tools, serviceops
Volans triaged T410506: New SSH keys for effie as Medium priority.
Wed, Nov 19, 2:30 PM · SRE, SRE-Access-Requests
Volans moved T410506: New SSH keys for effie from Untriaged to Patch in Review on the SRE-Access-Requests board.
Wed, Nov 19, 2:30 PM · SRE, SRE-Access-Requests
Volans closed T409854: Requesting access to run queries on superset.wikimedia.org for Nik Gkountas as Resolved.
Wed, Nov 19, 12:14 PM · SRE, SRE-Access-Requests
Volans moved T410473: Requesting access to analytics-privatedata-users for catrope from Patch in Review to Awaiting User Input on the SRE-Access-Requests board.

@Catrope patch merged, will be live within ~30 minutes. Kerberos principal created, you should have received an email about it with instructions on how to setup your password.
Once you've verified that everything works as expected please resolve this task. Thanks :)

Wed, Nov 19, 9:37 AM · SRE, SRE-Access-Requests
Volans moved T410473: Requesting access to analytics-privatedata-users for catrope from Manager/NDA Approval/Confirmation to Patch in Review on the SRE-Access-Requests board.
Wed, Nov 19, 9:24 AM · SRE, SRE-Access-Requests
Volans updated the task description for T410473: Requesting access to analytics-privatedata-users for catrope.
Wed, Nov 19, 9:24 AM · SRE, SRE-Access-Requests
Volans updated the task description for T410473: Requesting access to analytics-privatedata-users for catrope.
Wed, Nov 19, 9:24 AM · SRE, SRE-Access-Requests
Volans moved T410473: Requesting access to analytics-privatedata-users for catrope from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.

Adding Data-Engineering for visibility, no approval required for WMF staff.
Pending approval from @SCherukuwada

Wed, Nov 19, 8:27 AM · SRE, SRE-Access-Requests
Volans updated the task description for T410473: Requesting access to analytics-privatedata-users for catrope.
Wed, Nov 19, 8:24 AM · SRE, SRE-Access-Requests
Volans triaged T410473: Requesting access to analytics-privatedata-users for catrope as Medium priority.
Wed, Nov 19, 8:22 AM · SRE, SRE-Access-Requests
Volans updated the task description for T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE.
Wed, Nov 19, 8:15 AM · Data-Engineering, SRE, SRE-Access-Requests

Tue, Nov 18

Volans updated the task description for T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE.
Tue, Nov 18, 7:18 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans updated subscribers of T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE.

@WMDECyn, in case @Chandra-WMDE's position is a fixed term contract, could you provide us with the expiration date so that we can add it to data.yaml to track it?

Tue, Nov 18, 7:18 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans triaged T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE as Medium priority.
Tue, Nov 18, 6:20 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans moved T410291: Grant Access to ops-limited for matthieulec from Manager/NDA Approval/Confirmation to Awaiting User Input on the SRE-Access-Requests board.

@MLechvien-WMF the patch has been merged, and then puppet-merged (an internal step required to make changes to the puppet repo live in production). Your user will be picked up and created at the next puppet run and will be on all hosts within 30 minutes from now. Once you've verified all works as expected please resolve this task, thanks.

Tue, Nov 18, 6:06 PM · SRE-Access-Requests, SRE
Volans moved T410426: Requesting access to analytics-privatedata-users for dsmit from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Tue, Nov 18, 5:15 PM · SRE, SRE-Access-Requests
Volans added a comment to T410426: Requesting access to analytics-privatedata-users for dsmit.

Please refer to https://wikitech.wikimedia.org/wiki/Data_Platform/Data_access#Access_Levels to clarify which level access you need to the analytics-privatedata-users group.

Tue, Nov 18, 5:14 PM · SRE, SRE-Access-Requests
Volans triaged T410426: Requesting access to analytics-privatedata-users for dsmit as Medium priority.
Tue, Nov 18, 5:12 PM · SRE, SRE-Access-Requests
Volans added a comment to T403153: Upgrade cloudcumin hosts to bookworm/trixie.

We've discussed this in the sre-tools-infrastructure team and there are no objections for the goal to keep the cloudcumin hosts in sync with the cumin ones and no objections for an in-place upgrade as suggested by Moritz offline.

Tue, Nov 18, 2:40 PM · Cloud-VPS, cloud-services-team
Volans added a comment to T409279: Update to FIDO backed production SSH key for btullis.

@BTullis can you confirm all is working fine and we can resolve this task?

Tue, Nov 18, 11:16 AM · SRE, SRE-Access-Requests
Volans moved T409854: Requesting access to run queries on superset.wikimedia.org for Nik Gkountas from Patch in Review to Awaiting User Input on the SRE-Access-Requests board.

@ngkountas patch merged, it should get live within 30 minutes from now. Once you've verified all works as expected please resolve this task.

Tue, Nov 18, 9:53 AM · SRE, SRE-Access-Requests
Volans moved T410270: Update mfischerwmf ssh key from Patch in Review to Awaiting User Input on the SRE-Access-Requests board.

@MFischer patch merged, it should get live within 30 minutes from now. Once you've verified all works as expected please resolve this task.

Tue, Nov 18, 9:52 AM · SRE, SRE-Access-Requests
Volans moved T409893: Requesting access to analytics-privatedata-users for AnkitaM from Patch in Review to Awaiting User Input on the SRE-Access-Requests board.

@MGerlach @AnkitaM: patch merged, it should get live within 30 minutes from now. Once you've verified all works as expected please resolve this task.

Tue, Nov 18, 9:52 AM · Data-Engineering, SRE, SRE-Access-Requests
Volans closed T409894: Grant Access to ldap/nda for AnkitaM as Resolved.

Patch merged, resolving. @AnkitaM you're currently part of the LDAP nda group and should be able to access all the UIs that require this group. Feel free to re-open in case you encounter any issue.

Tue, Nov 18, 9:51 AM · SRE, LDAP-Access-Requests
Volans closed T409894: Grant Access to ldap/nda for AnkitaM, a subtask of T406203: Start formal collaboration on understanding the use of maintenance templates, as Resolved.
Tue, Nov 18, 9:51 AM · Research (FY2025-26-Research-October-December)

Mon, Nov 17

Volans moved T409933: Requesting access to analytics-privatedata-users for lpintscher from Patch in Review to Awaiting User Input on the SRE-Access-Requests board.

@Lydia_Pintscher patch merged, it should get live within 30 minutes from now. Once you've verified all works as expected please resolve this task.

Mon, Nov 17, 5:45 PM · SRE, SRE-Access-Requests
Volans moved T409409: Requesting access to analytics_privatedata_users and SQL Lab for Arian Bozorg (WMDE) from Patch in Review to Awaiting User Input on the SRE-Access-Requests board.
Mon, Nov 17, 5:45 PM · SRE, SRE-Access-Requests
Volans moved T409933: Requesting access to analytics-privatedata-users for lpintscher from Manager/NDA Approval/Confirmation to Patch in Review on the SRE-Access-Requests board.

Whoops, my bad, it was not tracked in the summary and I missed it in the thread.

Mon, Nov 17, 5:40 PM · SRE, SRE-Access-Requests
Volans updated the task description for T409933: Requesting access to analytics-privatedata-users for lpintscher.
Mon, Nov 17, 5:39 PM · SRE, SRE-Access-Requests
Volans triaged T410291: Grant Access to ops-limited for matthieulec as Medium priority.

Pending approval from either @Kappakayala or @mark (from data.yaml approval list)

Mon, Nov 17, 5:31 PM · SRE-Access-Requests, SRE
Volans triaged T409409: Requesting access to analytics_privatedata_users and SQL Lab for Arian Bozorg (WMDE) as Medium priority.

@Arian_Bozorg patch merged, it should get live within 30 minutes from now. Once you've verified all works as expected please resolve this task.

Mon, Nov 17, 5:29 PM · SRE, SRE-Access-Requests
Volans updated the task description for T409409: Requesting access to analytics_privatedata_users and SQL Lab for Arian Bozorg (WMDE).
Mon, Nov 17, 5:28 PM · SRE, SRE-Access-Requests
Volans triaged T409933: Requesting access to analytics-privatedata-users for lpintscher as Medium priority.

Pending approval from @WMDE-leszek

Mon, Nov 17, 5:19 PM · SRE, SRE-Access-Requests
Volans updated the task description for T409933: Requesting access to analytics-privatedata-users for lpintscher.
Mon, Nov 17, 5:18 PM · SRE, SRE-Access-Requests
Volans triaged T409854: Requesting access to run queries on superset.wikimedia.org for Nik Gkountas as Medium priority.
Mon, Nov 17, 4:43 PM · SRE, SRE-Access-Requests
Volans added a comment to T409854: Requesting access to run queries on superset.wikimedia.org for Nik Gkountas.

@ngkountas just to be sure, when you say analytics-privatedata-users level 2 you mean with kerberos?

Mon, Nov 17, 4:42 PM · SRE, SRE-Access-Requests
Volans updated the task description for T409854: Requesting access to run queries on superset.wikimedia.org for Nik Gkountas.
Mon, Nov 17, 4:42 PM · SRE, SRE-Access-Requests
Volans moved T409893: Requesting access to analytics-privatedata-users for AnkitaM from Manager/NDA Approval/Confirmation to Patch in Review on the SRE-Access-Requests board.
Mon, Nov 17, 4:16 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans triaged T409893: Requesting access to analytics-privatedata-users for AnkitaM as Medium priority.
Mon, Nov 17, 4:12 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans triaged T410270: Update mfischerwmf ssh key as Medium priority.

Confirmed the ssh key out of band.

Mon, Nov 17, 2:32 PM · SRE, SRE-Access-Requests
Volans moved T409893: Requesting access to analytics-privatedata-users for AnkitaM from Patch in Review to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Mon, Nov 17, 2:23 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans added a project to T409893: Requesting access to analytics-privatedata-users for AnkitaM: Data-Engineering.

Adding Data-Engineering for visibility and @Milimetric, @Ahoelzl for approval (either of them) from the data engineering side, as per docs.

Mon, Nov 17, 2:21 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans updated the task description for T409893: Requesting access to analytics-privatedata-users for AnkitaM.
Mon, Nov 17, 1:53 PM · Data-Engineering, SRE, SRE-Access-Requests
Volans moved T409279: Update to FIDO backed production SSH key for btullis from Ready To Go to Awaiting User Input on the SRE-Access-Requests board.
Mon, Nov 17, 1:46 PM · SRE, SRE-Access-Requests

Nov 10 2025

Volans added a comment to T409712: Wrong url on checking silence on alertmanager.

Possibly related to T328869

Nov 10 2025, 11:28 AM · SRE, Observability-Alerting, observability

Nov 6 2025

Volans moved T409365: Grant zuul project access to `fast-iops` volume type and `4xiops` instance flavor from Discussion needed to Approved on the Cloud-VPS (Quota-requests) board.
Nov 6 2025, 11:50 AM · Cloud-VPS (Quota-requests)
Volans added a comment to T409365: Grant zuul project access to `fast-iops` volume type and `4xiops` instance flavor.

If I'm not mistaken the above patch should gran the usage of the requested instance type. I'm not sure if there is a separate setting for the fast-iops bit or it's included in that configuration.

Nov 6 2025, 11:49 AM · Cloud-VPS (Quota-requests)
Volans triaged T409365: Grant zuul project access to `fast-iops` volume type and `4xiops` instance flavor as Medium priority.
Nov 6 2025, 10:38 AM · Cloud-VPS (Quota-requests)
Volans moved T409365: Grant zuul project access to `fast-iops` volume type and `4xiops` instance flavor from Inbox to Discussion needed on the Cloud-VPS (Quota-requests) board.
Nov 6 2025, 10:37 AM · Cloud-VPS (Quota-requests)

Nov 3 2025

Volans added a comment to T409029: Flapping wikitech-static icinga alert.

From a quick check it seems that the host is randomly hammered by some traffic (to be investigated):

Nov 3 2025, 9:19 AM · wikitech.wikimedia.org, cloud-services-team

Oct 31 2025

Volans moved T408387: CloudVPS instance for ProVe from Inbox to Feedback needed on the Cloud-VPS (Project-requests) board.
Oct 31 2025, 8:46 AM · cloud-services-team (FY2025/26-Q1-Q2), Cloud-VPS (Project-requests)

Oct 21 2025

Restricted Application added a project to T407787: Alertmanager triggers an alert on IRC and email after the alert has resolved: Infrastructure-Foundations.

For some related historical context on the lack of parity between the Icinga and AM APIs in Spicerack see also T293209 (look for optimal).

Oct 21 2025, 2:06 PM · Infrastructure-Foundations, Spicerack, SRE-tools, Traffic, Observability-Alerting

Oct 10 2025

Volans added a comment to T250415: Homer: add parallelization support.

[note for future self] If we can wait for Python 3.14 to be around in our systems then we should evaluate also the new InterpreterPoolExecutor for this, it might be a good fit.

Oct 10 2025, 9:17 AM · User-Elukey, Infrastructure-Foundations, SRE-tools, homer

Oct 6 2025

Volans added a comment to T393692: transfer.py fails when handling nftables-configured firewall.

You can use SSH_AUTH_SOCK=/run/keyholder/proxy.sock scp [OPTIONS] cumin1002.eqiad.wmnet:/path/... /path/... from cumin1003 for example.

Oct 6 2025, 2:58 PM · database-backups, Infrastructure-Foundations
Volans edited P83603 Test custom fact from Gerrit patch.
Oct 6 2025, 10:54 AM
Volans created P83603 Test custom fact from Gerrit patch.
Oct 6 2025, 10:53 AM

Sep 24 2025

Volans closed T405434: PuppetFailure Puppet has failed on cloudcumin1001:9100 as Resolved.

Transient failure of git pull for the cloud/wmcs-cookbooks repository, self-resolved at the next puppet run.

Sep 24 2025, 7:40 AM · cloud-services-team

Sep 23 2025

Volans added a comment to T393600: sre.discovery cookbooks: refactor use of resolve_with_client_ip.

@Scott_French sorry for the trouble. The patch that added the timeout to the cookbook's version of the function was added ~1.5 years after the functionality landed in Spicerack and somehow we missed to double check the functionality equivalency when migrating the cookbook to the spicerack's module function. Sorry about that.
I think we can set a reasonable timeout default without the need to make it tunable, at least as a first fix, that could go in at anytime.
I doubt we'll have special needs for specific timeouts though, we're talking about DNS queries, not HTTP requests 😉

Sep 23 2025, 6:40 PM · serviceops

Sep 22 2025

Volans added a watcher for tools-infrastructure-team: Volans.
Sep 22 2025, 6:29 AM
Volans added a member for tools-infrastructure-team: Volans.
Sep 22 2025, 6:29 AM

Sep 11 2025

Volans added a comment to T404373: Log DNS queries from Cloud VPS clients.

Ideally sampled logs would be good enough, depending how complex is the setup to sample them.
If there are no easy options for a real sampling we could also consider alternatives approaches:

  • a poor's man sampling playing with log rotation and retention (e.g. rotate often and keep only 1 block every N rotated ones)
  • a size-based retention that limits the total size of the logs to a predictable amount (will not help with issues in the past but I guess that most issues where we need the data are live/ongoing)
  • deduplicate the logs to increase the signal in the logs
Sep 11 2025, 4:21 PM · Patch-For-Review, cloud-services-team, Cloud-VPS
Volans added a comment to T404300: Remove KernelErrors alerts.

+1 for me

Sep 11 2025, 9:06 AM · cloud-services-team (FY2025/26-Q1-Q2)
Volans added a comment to T404282: KernelErrors Server cloudcephosd1041 logged kernel errors.

The problem is that there is no evidence in hardware logs and I doubt we'll get any replacement from Dell without them.

Sep 11 2025, 8:39 AM · cloud-services-team
Volans triaged T404282: KernelErrors Server cloudcephosd1041 logged kernel errors as Medium priority.
Sep 11 2025, 7:52 AM · cloud-services-team
Volans added a comment to T404282: KernelErrors Server cloudcephosd1041 logged kernel errors.

I've found this in kern.log/dmesg but nothing in racadm logs (both getsel and lclog):

Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786152] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786154] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786155] {1}[Hardware Error]: event severity: corrected
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786156] {1}[Hardware Error]:  Error 0, type: corrected
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786156] {1}[Hardware Error]:  fru_text: B1
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786157] {1}[Hardware Error]:   section_type: memory error
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786157] {1}[Hardware Error]:   error_status: 0x0000000000000400
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786158] {1}[Hardware Error]:   physical_address: 0x0000000f9ec05180
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786159] {1}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 2 bank: 18 device: 6 row: 53592 column: 448
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786160] {1}[Hardware Error]:   error_type: 2, single-bit ECC
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786161] {1}[Hardware Error]:   DIMM location:  B1
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786168] [Firmware Warn]: GHES: Invalid error status block length!
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786185] soft_offline: 0xf9ec05: invalidated
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786192] mce: [Hardware Error]: Machine check events logged
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.786194] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 255: 940000000000009f
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.794205] mce: [Hardware Error]: TSC 0 ADDR f9ec05180
Sep 11 02:33:32 cloudcephosd1041 kernel: [125071.799695] mce: [Hardware Error]: PROCESSOR 0:806f8 TIME 1757558012 SOCKET 0 APIC 0 microcode 2b000639
Sep 11 2025, 7:52 AM · cloud-services-team

Sep 10 2025

Volans added a comment to T404163: CodeSearch is unresponsive (2025-09-10).

Restart completed

Sep 10 2025, 9:13 AM · VPS-project-Codesearch
Volans added a comment to T404163: CodeSearch is unresponsive (2025-09-10).

Both ssh and the VM console hangs, without giving any prompt. Forcing a VM restart.

Sep 10 2025, 8:54 AM · VPS-project-Codesearch
Volans claimed T404163: CodeSearch is unresponsive (2025-09-10).

I'm looking into it

Sep 10 2025, 8:51 AM · VPS-project-Codesearch
Volans added a watcher for cloud-services-team: Volans.
Sep 10 2025, 8:05 AM
Volans added a member for cloud-services-team: Volans.
Sep 10 2025, 8:05 AM

Sep 8 2025

Volans closed T378331: Puppet module hiera_lookup not working as Resolved.

This might have been related to the migration to puppet7 and the new puppetdb hosts probably. I can't recall. Resolving as it cannot be reproduced right now AFAICT, feel free to re-open if that's not the case.

Sep 8 2025, 3:05 PM · Infrastructure-Foundations, SRE-tools, Spicerack
Volans added a comment to T378331: Puppet module hiera_lookup not working.

It works fine for me:

>>> p.hiera_lookup('cumin1003.eqiad.wmnet', 'profile::puppet::agent::force_puppet7')
DRY-RUN: Executing commands ['puppet lookup --render-as s --compile --node cumin1003.eqiad.wmnet profile::puppet::agent::force_puppet7 2>/dev/null'] on 1 hosts: puppetserver1001.eqiad.wmnet
'true'
Sep 8 2025, 2:57 PM · Infrastructure-Foundations, SRE-tools, Spicerack

Aug 28 2025

Volans added a comment to T403153: Upgrade cloudcumin hosts to bookworm/trixie.

+1 for me to upgrade them to bookworm for simplicity and to be in sync with the cumin hosts.

Aug 28 2025, 12:26 PM · Cloud-VPS, cloud-services-team

Jul 21 2025

Volans added a comment to T388874: Update Kubernetes library version in spicerack.

Quick update, the cumin hosts are now on bookworm where python3-kubernetes is on v22.6.0, but we still have cumin1002 around on bullseye until the DBA-stuff has been all made compatible with bookworm, see T389380.

Jul 21 2025, 3:54 PM · serviceops, Datacenter-Switchover
Volans added a comment to T399449: decommission db1246.eqiad.wmnet.

@Marostegui sorry I didn't understand that the old host was already unracked or otherwise unreachable also on the management side and I thought my earlier reply was already covering the questions, my bad.

Jul 21 2025, 9:04 AM · SRE, DC-Ops, ops-eqiad, DBA, decommission-hardware

Jul 16 2025

Volans closed T341973: Spicerack: add distributed locking support as Resolved.
Jul 16 2025, 10:47 AM · Patch-For-Review, Infrastructure-Foundations, SRE-tools, Spicerack

Jul 14 2025

Volans added a comment to T397687: Increase the default batch size of puppet.run().

@JMeybohm do you have a specific use case that cannot/is hard to solve simply changing the batch_size of the call to puppet.run()?
https://doc.wikimedia.org/spicerack/master/api/spicerack.puppet.html#spicerack.puppet.PuppetHosts.run

Jul 14 2025, 2:52 PM · Infrastructure-Foundations, Spicerack, SRE-tools
Volans added a comment to T399449: decommission db1246.eqiad.wmnet.

If the new host has a new hostname I think the usual decom template can be used and followed. I don't see any blocker there, if the host is not up and running or the disks were removed the only thing skipped by the decom cookbook will be the disk wipe.

Jul 14 2025, 12:29 PM · SRE, DC-Ops, ops-eqiad, DBA, decommission-hardware

Jul 9 2025

Volans added a comment to T392851: Q4:rack/setup/install cp20[43-58] codfw.

Are we sure that the network card is properly installed? I'm getting this from racadm:

Jul 9 2025, 4:33 PM · User-Elukey, SRE, Patch-For-Review, Traffic, ops-codfw, DC-Ops
Volans added a comment to T399069: Proposal: adding a kafka admin client to spicerack.

An immediate workaround was implemented in https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1167593 that gives us also an idea on what could be useful to expose from spicerack. If a plain KafkaAdminClient just pre-configured with the connection to the current cluster or some more advanced wrapper to extract specific information.
As a starter probably just exposing the KafkaAdminClient could be enough I guess but I'll leave to the kafka experts the decision on this.

Jul 9 2025, 1:40 PM · Data-Platform-SRE (2025.07.26 - 2025.08.15), Infrastructure-Foundations, SRE-tools, Spicerack
Volans added a project to T398444: More frequent Puppet runs on the alert hosts?: SRE-tools.

I wonder if the prometheus servers have a similar behavior of applying changes from puppet exported resources.

Jul 9 2025, 9:42 AM · Infrastructure-Foundations, SRE-tools, SRE Observability (FY2025/2026-Q1)

Jul 3 2025

Volans added a comment to T398464: Netbox: PupeptDB Import - ignore 'vxlan' and 'openvswitch' interfaces without IPs.

Ack, let's do both: disable it in the bios and skip it in the import

Jul 3 2025, 10:23 AM · Infrastructure-Foundations, SRE
Volans created T398605: Prometheus puppettization has a very large directory.
Jul 3 2025, 10:19 AM · Observability-Metrics, observability
Volans added a comment to T398412: Decom cookbook: delete virtual interfaces from device.

Option 2 LGTM too

Jul 3 2025, 10:16 AM · Patch-For-Review, netbox, netops, Infrastructure-Foundations, SRE
Volans added a comment to T398464: Netbox: PupeptDB Import - ignore 'vxlan' and 'openvswitch' interfaces without IPs.

Totally agree there is no point. For the idrac one the only potential use case would be to match it with our existing mgmt but probably not.

Jul 3 2025, 10:16 AM · Infrastructure-Foundations, SRE

Jul 2 2025

Volans triaged T397696: I/F hackathon June 2025: Add kubernetes support to Debmonitor as Medium priority.
Jul 2 2025, 9:54 AM · Patch-For-Review, Infrastructure-Foundations

Jun 27 2025

Volans updated subscribers of T396396: decommission cloudcontrol2004-dev.codfw.wmnet.
Jun 27 2025, 2:07 PM · SRE, DC-Ops, ops-codfw, cloud-services-team, decommission-hardware
Volans added a comment to T397868: Decommission frack hosts: frpig2001 pay-lvs2001 pay-lvs2002.

There are currently changes to remove:

frpig2001.mgmt.frack.codfw.wmnet
pay-lvs2001.mgmt.frack.codfw.wmnet
pay-lvs2002.mgmt.frack.codfw.wmnet
pay-lvs2001.frack.codfw.wmnet
pay-lvs2002.frack.codfw.wmnet
frban1001.mgmt.frack.eqiad.wmnet
frpig2001.frack.codfw.wmnet
frban1001.frack.eqiad.wmnet

and their related reverse PTR records.
Are those ok to be removed from the live DNS?

Jun 27 2025, 7:13 AM · SRE, DC-Ops, ops-codfw, decommission-hardware, fundraising-tech-ops
Volans added a comment to T397868: Decommission frack hosts: frpig2001 pay-lvs2001 pay-lvs2002.

When editing netbox DNS records please always make sure to run the sre.dns.netbox cookbook as otherwise there are pending changes that will block other users and trigger icinga alerts.

Jun 27 2025, 7:11 AM · SRE, DC-Ops, ops-codfw, decommission-hardware, fundraising-tech-ops

Jun 26 2025

Volans added a comment to T392851: Q4:rack/setup/install cp20[43-58] codfw.

But I've tested the scp_dump that was failing and it's fixed. So I think the provision should work. Feel free to try it (from cumin2002 that has the latest version, I will update the others as soon as other testing on other changes is completed)

Jun 26 2025, 9:12 AM · User-Elukey, SRE, Patch-For-Review, Traffic, ops-codfw, DC-Ops