Page MenuHomePhabricator
Feed Search

Fri, Jun 12

fgiunchedi updated the task description for T429003: mwopenstackclients.py reliability.
Fri, Jun 12, 9:52 AM · cloud-services-team, Cloud-VPS

Thu, Jun 11

fgiunchedi added a comment to T428919: Replace https://os-deprecation.toolforge.org/ with something that handles in-place upgraded hosts.

Following up from the meeting, one source of truth for running/reachable instances is in the node_debian_version Prometheus metric, just as an example:

Thu, Jun 11, 3:18 PM · cloud-services-team, Cloud-VPS (Debian Bullseye Deprecation)
fgiunchedi added a comment to T428867: Openstack cinder volumes backups are broken.

The systemd unit is clearly failed:

● backup_cinder_volumes.service                                                                    loaded failed failed    backup cinder volumes
● remove_dangling_cinder_snapshots.service                                                         loaded failed failed    backup cinder volumes

As for why it didn't alert I think it might be related to team-wmcs/general_systemd_unit_down.yaml

# deploy-tag: ops
# deploy-site: eqiad
Thu, Jun 11, 10:39 AM · cloud-services-team, Cloud-VPS
fgiunchedi created T428873: Ditch wmcs-specific alerts for systemdunitdown in favor of production alerts.
Thu, Jun 11, 9:34 AM · Patch-For-Review, tools-infrastructure-team, cloud-services-team, Cloud-VPS

Wed, Jun 10

fgiunchedi added a comment to T345983: Remove Icinga checks for Cloud VPS projects (not: infrastructure).

@taavi do you reckon there's anything in this task that's not covered by checks in T328502: Move WMCS off of Icinga and introduce alertmanager ?

Wed, Jun 10, 9:28 AM · cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi removed a parent task for T357977: [toolforge.infra] run and monitor our own sample tools: T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones.
Wed, Jun 10, 8:47 AM · Sustainability (Incident Followup), cloud-services-team (FY2025/2026-Q3-Q4), Patch-For-Review, User-aborrero, Toolforge
fgiunchedi removed a subtask for T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones: T357977: [toolforge.infra] run and monitor our own sample tools.
Wed, Jun 10, 8:47 AM · Patch-For-Review, cloud-services-team (FY2025/2026-Q3-Q4), Toolforge
fgiunchedi removed a parent task for T288053: Add external meta-monitoring for metricsinfra: T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones.
Wed, Jun 10, 8:46 AM · cloud-services-team (FY2025/2026-Q3-Q4), SRE-OnFire, Patch-For-Review, Sustainability (Incident Followup), Cloud-VPS
fgiunchedi removed a subtask for T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones: T288053: Add external meta-monitoring for metricsinfra.
Wed, Jun 10, 8:46 AM · Patch-For-Review, cloud-services-team (FY2025/2026-Q3-Q4), Toolforge
fgiunchedi removed a subtask for T328502: Move WMCS off of Icinga and introduce alertmanager: T347148: Determine how to monitor services in cloud-private / cloudlb.
Wed, Jun 10, 8:44 AM · cloud-services-team (FY2025/2026-Q3-Q4), Epic, Toolforge, Cloud-VPS, Observability-Alerting, Goal
fgiunchedi removed a parent task for T347148: Determine how to monitor services in cloud-private / cloudlb: T328502: Move WMCS off of Icinga and introduce alertmanager.
Wed, Jun 10, 8:44 AM · observability, cloud-services-team, Cloud-VPS

Tue, Jun 9

fgiunchedi closed T345294: Move Cloud VPS control plane alerting to alertmanager, a subtask of T328502: Move WMCS off of Icinga and introduce alertmanager, as Resolved.
Tue, Jun 9, 8:09 AM · cloud-services-team (FY2025/2026-Q3-Q4), Epic, Toolforge, Cloud-VPS, Observability-Alerting, Goal
fgiunchedi closed T345294: Move Cloud VPS control plane alerting to alertmanager as Resolved.

This is done, labs-ip-alias-dump icinga check has been removed in Ib8d290 and flavor property check is tracked in parent task

Tue, Jun 9, 8:08 AM · cloud-services-team (FY2025/2026-Q3-Q4), Cloud-VPS
fgiunchedi updated the task description for T328502: Move WMCS off of Icinga and introduce alertmanager.
Tue, Jun 9, 7:58 AM · cloud-services-team (FY2025/2026-Q3-Q4), Epic, Toolforge, Cloud-VPS, Observability-Alerting, Goal
fgiunchedi updated the task description for T328502: Move WMCS off of Icinga and introduce alertmanager.
Tue, Jun 9, 7:33 AM · cloud-services-team (FY2025/2026-Q3-Q4), Epic, Toolforge, Cloud-VPS, Observability-Alerting, Goal
fgiunchedi updated subscribers of T427457: Provide a scheduled data download service from Google Cloud Storage.

cc netops (i.e. @ayounsi and @cmooney)

Tue, Jun 9, 6:42 AM · Traffic, Data-Platform-SRE, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, Jun 8

fgiunchedi added a comment to T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones.

Following up from team meeting, only the k8s etcd checks are relevant nowadays to port to AM, the rest can be ditched

Mon, Jun 8, 11:54 AM · Patch-For-Review, cloud-services-team (FY2025/2026-Q3-Q4), Toolforge
fgiunchedi created T428440: Report account age, number of edits, etc in Striker membership requests.
Mon, Jun 8, 11:52 AM · tools-platform-team, Striker
fgiunchedi closed T420565: Audit tools memory requests vs actual usage as Resolved.

I'll call this done and good enough™: I tackled the low-hanging fruits, namely webservice defaults have been lowered significantly which have increased memory utilization. Another side effect is that new tools will have the lower defaults, thus overall slowing down the rate of memory requests we have to satisfy at all times.

Mon, Jun 8, 11:33 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi closed T420565: Audit tools memory requests vs actual usage, a subtask of T414513: Add new alerts for Toolforge cluster high load, as Resolved.
Mon, Jun 8, 11:33 AM · cloud-services-team, Toolforge

Thu, May 28

fgiunchedi added a watcher for tools-infrastructure-team: fgiunchedi.
Thu, May 28, 3:37 PM
fgiunchedi added a member for tools-infrastructure-team: fgiunchedi.
Thu, May 28, 3:37 PM

Wed, May 27

fgiunchedi closed T427352: Remove obsolete maintain-kubeusers limitranges, a subtask of T420565: Audit tools memory requests vs actual usage, as Resolved.
Wed, May 27, 9:29 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi closed T427352: Remove obsolete maintain-kubeusers limitranges as Resolved.

This is done

Wed, May 27, 9:29 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi closed T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing) as Resolved.

This is done: toolsbeta alerts get rewritten at deploy time to both change severity and strip the page tag from annotations

Wed, May 27, 9:21 AM · SRE Observability, Toolforge, cloud-services-team
fgiunchedi added a comment to T427352: Remove obsolete maintain-kubeusers limitranges.

Now roll-restart works as expected

Wed, May 27, 8:28 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi added a comment to T427352: Remove obsolete maintain-kubeusers limitranges.

I just did it™ in toolsbeta:

Wed, May 27, 8:26 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi created T427352: Remove obsolete maintain-kubeusers limitranges.
Wed, May 27, 8:22 AM · tools-platform-team, cloud-services-team, Toolforge

Tue, May 26

fgiunchedi closed T427204: Investigate toolforge failure to schedule pods due to insufficient cpu as Resolved.

Looks like you linked to the "secret" panel again 🙂 I took a shot at removing the "-rw" at the start and that worked.

Tue, May 26, 1:23 PM · cloud-services-team, Toolforge
fgiunchedi added a comment to T427204: Investigate toolforge failure to schedule pods due to insufficient cpu.

That's quite possible. I haven't really checked CPU usage. I was just hoping more CPUs means more faster 🙂 I'd guess that the CPU usage fluctuates for the different steps in the process that the tool runs.

Tue, May 26, 12:37 PM · cloud-services-team, Toolforge
fgiunchedi removed a project from T426827: gitlab workers ulimit nofiles 1073741816 slows down fakeroot: cloud-services-team.

My ulimit -n 4096 bandaid is deployed in cloud/cicd/gitlab-ci now; I don't know whether there's appetite to ship a smaller ulimit -n as a whole on gitlab workers. Either way I'm untagging wmcs, feel free to resolve/decline as you see fit

Tue, May 26, 9:32 AM · Release-Engineering-Team (Doing 😎), Patch-For-Review, GitLab (CI & Job Runners), collaboration-services
fgiunchedi updated the task description for T423596: Adjust WMCS Gitlab CI/CD repo to stop using mirrors.wikimedia.org.
Tue, May 26, 9:26 AM · Toolforge, cloud-services-team, Infrastructure-Foundations, SRE
fgiunchedi added a comment to T427202: bird bfd session with 172.20.1.1 down - Bad packet from 172.20.1.1 - unknown session id.

Thank you for the detailed explanation @cmooney, definitely TIL things about BFD I didn't know! Hard for me to say if rare enough, metric queries in the form of bird_bfd_session_up{instance=~"^cloudlb.*"} == 0 seem to confirm this is the first time we've seen it on cloudlb. There's likely something we can do in terms of alerts to at least detect the issue at least temporarily until we get new switches

Tue, May 26, 8:32 AM · Cloud-VPS, cloud-services-team, Infrastructure-Foundations, netops
fgiunchedi created T427249: Collect toolforge k8s scheduler metrics on port 10259.
Tue, May 26, 8:10 AM · cloud-services-team, Toolforge
fgiunchedi created P92933 (An Untitled Masterwork).
Tue, May 26, 8:09 AM
fgiunchedi added a comment to T427204: Investigate toolforge failure to schedule pods due to insufficient cpu.

Thank you for the detailed report @Sebastian_Berlin-WMSE !

Tue, May 26, 7:53 AM · cloud-services-team, Toolforge

Mon, May 25

fgiunchedi created T427204: Investigate toolforge failure to schedule pods due to insufficient cpu.
Mon, May 25, 2:17 PM · cloud-services-team, Toolforge
fgiunchedi created P92886 (An Untitled Masterwork).
Mon, May 25, 2:17 PM
fgiunchedi created T427202: bird bfd session with 172.20.1.1 down - Bad packet from 172.20.1.1 - unknown session id.
Mon, May 25, 2:06 PM · Cloud-VPS, cloud-services-team, Infrastructure-Foundations, netops
fgiunchedi closed T420937: experiment with moving rabbitmq behind haproxy as Declined.

Yes I'm happy with pointing rabbit clients to the whole cluster and let client-side logic handle failures/retries, the latest rounds of reboots have been successful

Mon, May 25, 1:58 PM · cloud-services-team, Cloud-VPS
fgiunchedi closed T420937: experiment with moving rabbitmq behind haproxy, a subtask of T418444: Increased openstack latency and rabbitmq rolling restarts on certificate update, as Declined.
Mon, May 25, 1:58 PM · cloud-services-team (FY2025/2026-Q3-Q4), Cloud-VPS
fgiunchedi closed T422646: memcache is a SPOF for designate/tooz coordination as Resolved.

This is deployed, designate now uses zk for coordination. A rolling-restart of cloudcontrol is dealt with by tooz as expected,e.g

Mon, May 25, 10:27 AM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi closed T422646: memcache is a SPOF for designate/tooz coordination, a subtask of T417393: Carry out controlled network switch down tests in cloud, as Resolved.
Mon, May 25, 10:27 AM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi created T427189: cleanup mcrouter from cloudcontrol hosts.
Mon, May 25, 10:24 AM · cloud-services-team, Cloud-VPS

Fri, May 22

fgiunchedi renamed T426827: gitlab workers ulimit nofiles 1073741816 slows down fakeroot from webservice-cli package deb gitlab CI job went from 9 minutes to 27 minutes to gitlab workers ulimit nofiles 1073741816 slows down fakeroot.
Fri, May 22, 7:21 AM · Release-Engineering-Team (Doing 😎), Patch-For-Review, GitLab (CI & Job Runners), collaboration-services

Thu, May 21

fgiunchedi added a comment to T426827: gitlab workers ulimit nofiles 1073741816 slows down fakeroot.

To test this theory I changed webservice-cli gitlab-ci to lower the open files ulimit

Thu, May 21, 10:39 AM · Release-Engineering-Team (Doing 😎), Patch-For-Review, GitLab (CI & Job Runners), collaboration-services
fgiunchedi added a comment to T426827: gitlab workers ulimit nofiles 1073741816 slows down fakeroot.

The problem seems to be fakeroot and a huge ulimit -n, so fakeroot spends all its time closing files up to ulimit

Thu, May 21, 10:35 AM · Release-Engineering-Team (Doing 😎), Patch-For-Review, GitLab (CI & Job Runners), collaboration-services

Wed, May 20

fgiunchedi added a comment to T426837: anycast-healthchecker fails to start on boot.

Prior art at T314457: anycast-healthchecker fails to start after a reboot and before a puppet run

Wed, May 20, 12:49 PM · cloud-services-team, Cloud-VPS
fgiunchedi added a comment to T420565: Audit tools memory requests vs actual usage.

Status update: I tried deploying the memory request 64mb change though toolsbeta said no due to limitrange

Wed, May 20, 11:55 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi created T426827: gitlab workers ulimit nofiles 1073741816 slows down fakeroot.
Wed, May 20, 9:08 AM · Release-Engineering-Team (Doing 😎), Patch-For-Review, GitLab (CI & Job Runners), collaboration-services
fgiunchedi created P92663 (An Untitled Masterwork).
Wed, May 20, 9:06 AM
fgiunchedi created P92662 (An Untitled Masterwork).
Wed, May 20, 9:05 AM
fgiunchedi added a comment to T377568: wmcs codfw hardware changes proposal.

After some discussion today, I propose that we just switch off and decom cloudnet200[78]-dev.

Wed, May 20, 7:52 AM · Cloud-VPS, User-aborrero, cloud-services-team (Hardware)

Mon, May 18

fgiunchedi added a comment to T426016: heroku builder and runner 24_0.21.8 rejects harbor ip host.

Verified on lima-kilo on Linux, nuked the VM when ./start-devenv.sh asked and ran the verification commands

Mon, May 18, 9:54 AM · Patch-For-Review, Toolforge, tools-platform-team
fgiunchedi added a comment to T426378: Tools may not allow non-interactive commands via 'become' due to dotfile configuration.

Thank you @bd808 for the fix and digging up T186108, definitely not a new problem!

Mon, May 18, 7:17 AM · tools-platform-team, cloud-services-team, Toolforge

Fri, May 15

fgiunchedi added a project to T424929: Report reprepro pending updates: Infrastructure-Foundations.

+infra-foundations JFYI / for visibility and feedback, not urgent in any shape or form

Fri, May 15, 9:15 AM · Infrastructure-Foundations, cloud-services-team, tools-infrastructure-team
fgiunchedi created T426378: Tools may not allow non-interactive commands via 'become' due to dotfile configuration.
Fri, May 15, 8:31 AM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi added a comment to T425088: Q3 :rack/setup/install cloudvirt refresh.

@Jclark-ctr once T426180 is resolved and hosts can be reimaged, please rack as follows

Fri, May 15, 8:21 AM · SRE, ops-eqiad, DC-Ops

May 14 2026

fgiunchedi added a comment to T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing).

Thank you for following up @tappof ! I'll give the implementation a go in the next couple of week and report back in case I need help, for sure I'll reach out for reviews

May 14 2026, 12:35 PM · SRE Observability, Toolforge, cloud-services-team
fgiunchedi added a comment to T420565: Audit tools memory requests vs actual usage.

The first reduction is default memory requests has been deployed, as expected we're now under the alerting threshold for memory requests (from ~88% to ~76%)

May 14 2026, 6:51 AM · tools-platform-team, cloud-services-team, Toolforge

May 13 2026

fgiunchedi added a comment to T425905: Quota increase request for project language.

+1

May 13 2026, 9:13 AM · Cloud-VPS (Quota-requests)
fgiunchedi created P92497 (An Untitled Masterwork).
May 13 2026, 8:45 AM

May 8 2026

fgiunchedi created P92441 (An Untitled Masterwork).
May 8 2026, 12:26 PM
fgiunchedi created P92436 (An Untitled Masterwork).
May 8 2026, 9:24 AM

May 7 2026

fgiunchedi added a project to T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing): SRE Observability.

I'm adding o11y folks for their input both on the idea as a whole and on the proposed implementation. For context, this is not urgent on the Toolforge side, more something "nice to have" and that has confused folks looking at toolforge alerts. If the idea looks sane I can take on the implementation (modulo the usual work scheduling)

May 7 2026, 8:37 AM · SRE Observability, Toolforge, cloud-services-team
fgiunchedi renamed T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing) from Make toolsbeta paging alerts less confusing to Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing).
May 7 2026, 8:26 AM · SRE Observability, Toolforge, cloud-services-team
fgiunchedi updated the task description for T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing).
May 7 2026, 8:21 AM · SRE Observability, Toolforge, cloud-services-team

May 6 2026

fgiunchedi updated the task description for T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing).
May 6 2026, 3:36 PM · SRE Observability, Toolforge, cloud-services-team
fgiunchedi edited Description on Cloud-VPS (Quota-requests).
May 6 2026, 1:32 PM

May 5 2026

fgiunchedi created T425412: Alert on openstack resources close to running out.
May 5 2026, 12:25 PM · Cloud-VPS, cloud-services-team
fgiunchedi created T425400: Set external url for thanos.w.o web interface.
May 5 2026, 9:08 AM · SRE Observability
fgiunchedi updated the task description for T424658: Ensure cloudvirt capacity is more evenly spread out among racks.
May 5 2026, 8:27 AM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)

May 2 2026

fgiunchedi created T425215: toolforge webservice restart does not wait for pod to be ready, only running.
May 2 2026, 2:47 PM · tools-platform-team, Toolforge, cloud-services-team

Apr 30 2026

fgiunchedi triaged T424929: Report reprepro pending updates as Low priority.
Apr 30 2026, 9:06 AM · Infrastructure-Foundations, cloud-services-team, tools-infrastructure-team
fgiunchedi created T424929: Report reprepro pending updates.
Apr 30 2026, 9:05 AM · Infrastructure-Foundations, cloud-services-team, tools-infrastructure-team

Apr 29 2026

fgiunchedi added a comment to T422820: oslo.messaging does not failover to the next rabbit host on traffic blackhole situations.

Upstream bug at https://bugs.launchpad.net/oslo.messaging/+bug/2150632

Apr 29 2026, 11:47 AM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi created T424814: Support deploying paging alerts to non-production environments (was: make toolsbeta paging alerts less confusing).
Apr 29 2026, 9:53 AM · SRE Observability, Toolforge, cloud-services-team
fgiunchedi created T424802: cloudvirt1075 in 'maintenance' aggregate.
Apr 29 2026, 8:02 AM · Cloud-VPS, cloud-services-team

Apr 28 2026

fgiunchedi updated the task description for T424658: Ensure cloudvirt capacity is more evenly spread out among racks.
Apr 28 2026, 1:56 PM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi updated the task description for T424658: Ensure cloudvirt capacity is more evenly spread out among racks.
Apr 28 2026, 1:19 PM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi created T424658: Ensure cloudvirt capacity is more evenly spread out among racks.
Apr 28 2026, 1:12 PM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi added a comment to T419658: Controlled cloudsw down tests for F4.

I crunched some numbers today to see the resource distribution across racks:

Apr 28 2026, 10:06 AM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)

Apr 27 2026

fgiunchedi updated the task description for T420565: Audit tools memory requests vs actual usage.
Apr 27 2026, 11:41 AM · tools-platform-team, cloud-services-team, Toolforge

Apr 24 2026

fgiunchedi updated the task description for T420565: Audit tools memory requests vs actual usage.
Apr 24 2026, 1:04 PM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi created P91435 (An Untitled Masterwork).
Apr 24 2026, 11:11 AM
fgiunchedi created T424312: Set external url for prometheus/thanos metricsinfra web interface.
Apr 24 2026, 6:44 AM · Cloud-VPS, cloud-services-team

Apr 23 2026

fgiunchedi added a comment to T420565: Audit tools memory requests vs actual usage.

I started from webservice-cli limits, and was thinking of the following deployment plan:

Apr 23 2026, 1:11 PM · tools-platform-team, cloud-services-team, Toolforge
fgiunchedi added a comment to T424068: Request creation of wise VPS project.

Hello and thank you for reaching out. Wikimedia Foundation offers and supports a platform for running managed user workloads called Toolforge (https://wikitech.wikimedia.org/wiki/Portal:Toolforge). In practice it means having containers running on k8s, though with less per-containers resources (e.g. 4GB). Would WISE run distributed/partitioned/sharded in multiple containers and thus could run on Toolforge? Running on Toolforge, among other benefits, would mean not having to operate/admin a VM and instead run on a supported platform. Please let us know!

Apr 23 2026, 11:57 AM · WISE, Cloud-VPS (Project-requests)
fgiunchedi added a comment to T424192: Request creation of wolf-a Cloud VPS project.

I have listed the blockers below, and I'm having an hard time understanding why Toolforge would not work with a fully containerized deployment. Would you mind expanding on the details of each blocker and how Toolforge does not satisfy it? thank you

Apr 23 2026, 9:53 AM · Cloud-VPS (Project-requests)
fgiunchedi added a comment to T424192: Request creation of wolf-a Cloud VPS project.

Hello, thank you for your interest in Cloud VPS. From the description it seems the software can be containerized and run on Toolforge, which is the preferred and supported way to run software. What are the expected resource requirements? And could you provide example interactions of the software you intend to run? If the code is already available please provide links to the source code as well. Thank you !

Apr 23 2026, 9:15 AM · Cloud-VPS (Project-requests)

Apr 20 2026

fgiunchedi added a comment to T423598: Migrate our use of osbpo away from mirrors.wikimedia.org.

I think we should be considering importing osbpo.debian.net apt repo as an upstream into aptrepo puppet module (i.e. modules/aptrepo/files/updates and related) and serve it locally from apt.w.o.

Apr 20 2026, 2:20 PM · tools-infrastructure-team, Cloud-VPS
fgiunchedi added a comment to T423675: Buildservice for Rust fails.

Is archive.ubuntu.com working now? Was it the only host failing?

Apr 20 2026, 1:27 PM · cloud-services-team, Toolforge
fgiunchedi added a comment to T423598: Migrate our use of osbpo away from mirrors.wikimedia.org.

I think we should be considering importing osbpo.debian.net apt repo as an upstream into aptrepo puppet module (i.e. modules/aptrepo/files/updates and related) and serve it locally from apt.w.o.

How big is that component in total?

Apr 20 2026, 1:06 PM · tools-infrastructure-team, Cloud-VPS
fgiunchedi added a comment to T422646: memcache is a SPOF for designate/tooz coordination.

I took a look at the codfw set up, with one thing to change: the tooz backend url should list all zk servers, so we can safely roll-restart zookeeper.service as well and designate/tooz will failover

Apr 20 2026, 9:40 AM · Cloud-VPS, cloud-services-team (FY2025/2026-Q3-Q4)
fgiunchedi updated the task description for T423847: Store and optionally show the full firehose in csp-report.
Apr 20 2026, 8:51 AM · Tools
fgiunchedi closed T422916: CSP violations with known domains in the blocked-uri are not collected by csp-report as Resolved.
Apr 20 2026, 8:50 AM · Tools
fgiunchedi added a comment to T422916: CSP violations with known domains in the blocked-uri are not collected by csp-report.

That's fair re: user confusion concerns. From my SRE POV I was surprised to find that the CSP report url we announce filters the feed of legitimate, albeit confusing to tool maintainers, reports. I am thinking of a middle ground where we collect all reports and present the report firehose unfiltered only on demand. The known-domains retention of course can be short as we don't really care for it except for operational problems. What do you think ?

I'm not sure in the current pipeline where I would store reports with a different retention pattern or where I would insert output filtering to screen that noise from the maintainer facing interface. I don't have any objections to someone figuring those things out and implementing them if it feels like it would add value for administrative investigations.

Apr 20 2026, 8:50 AM · Tools
fgiunchedi triaged T423847: Store and optionally show the full firehose in csp-report as Low priority.
Apr 20 2026, 8:50 AM · Tools
fgiunchedi created T423847: Store and optionally show the full firehose in csp-report.
Apr 20 2026, 8:49 AM · Tools
fgiunchedi added a comment to T416803: Temporary instance increase for Bullseye servers deprecation.

Thank you for letting us know @YochayCO, appreciate it!

Apr 20 2026, 8:46 AM · Cloud-VPS (Quota-requests)