Page MenuHomePhabricator

CDanis (Chris Danis)
SRE

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Nov 5 2018, 2:54 PM (388 w, 1 d)
Availability
Available
IRC Nick
cdanis
LDAP User
CDanis
MediaWiki User
CDanis (WMF) [ Global Accounts ]

hey look at this graph 📈

Recent Activity

Mon, Apr 13

CDanis closed T423130: Jaeger ( https://trace.wikimedia.org ) is not available, request fails with upstream error as Resolved.

thanks for the report, sorry for missing this as part of T414486

Mon, Apr 13, 5:12 PM · Observability-Tracing
CDanis triaged T423130: Jaeger ( https://trace.wikimedia.org ) is not available, request fails with upstream error as Medium priority.

Live-patched in production, service is restored.

Mon, Apr 13, 4:46 PM · Observability-Tracing
CDanis added a comment to T423121: Allow to easily disable puppet-merges temporarily.

Sounds a lot like T248872: puppet-merge lockout/tagout ?

Mon, Apr 13, 2:23 PM · Puppet-Infrastructure, Infrastructure-Foundations, SRE

Wed, Apr 1

CDanis closed T421751: Allow SFTP outbound requests from civi1002/civi2002 to deployment.eqiad.wmnet, a subtask of T416948: CiviCRM should export a CSV that can be used in a MediaWiki maintenance script, as Resolved.
Wed, Apr 1, 6:54 PM · Patch-For-Review, fr-current-sprint, FY25-26 WE3.5 Donor Identification and recognition
CDanis closed T421751: Allow SFTP outbound requests from civi1002/civi2002 to deployment.eqiad.wmnet as Resolved.
Wed, Apr 1, 6:54 PM · fundraising-tech-ops, Fundraising Sprint - Floor is Lava, Fundraising-Backlog

Tue, Mar 31

CDanis added a comment to T421785: Create SFTP keypair for MediaWiki donor export.

This should be working now:
deployment.eqiad.wmnet
user fundraising-data-uploader
/var/lib/fundraising-data-uploader (the user's homedir)

Tue, Mar 31, 4:05 PM · fundraising-tech-ops, Fundraising Sprint - Floor is Lava, fr-current-sprint, FY25-26 WE3.5 Donor Identification and recognition

Mon, Mar 30

CDanis added a comment to T421751: Allow SFTP outbound requests from civi1002/civi2002 to deployment.eqiad.wmnet.

Apologies, I'll write a host firewall patch tomorrow morning.

Mon, Mar 30, 8:56 PM · fundraising-tech-ops, Fundraising Sprint - Floor is Lava, Fundraising-Backlog

Wed, Mar 25

CDanis updated subscribers of T398236: Manage druid `webrequest_sampled_live` data size.

Thanks Joseph <3

Wed, Mar 25, 8:48 PM · Data-Engineering

Mon, Mar 23

CDanis added a comment to T420993: Rotate discovery intermediate certificate (expires 2026-05-03).

https://wikitech.wikimedia.org/wiki/PKI/CA_Operations#Renewing_a_new_intermediate

Mon, Mar 23, 7:53 PM · ServiceOps new, Infrastructure-Foundations, Patch-For-Review

Sat, Mar 21

CDanis updated the name of F73293414: VNPT cluster (isp vs fingerprints) from "image.png" to "VNPT cluster (isp vs fingerprints)".
Sat, Mar 21, 12:46 AM

Fri, Mar 20

CDanis added a comment to T419283: MediaWiki Telemetry::getRequestId() has different format between cli (mwscript) and web requests.

+1 from me on making UUIDv4 the only codepath

Fri, Mar 20, 3:48 PM · MW-1.46-notes (1.46.0-wmf.19; 2026-03-10), Observability-Tracing

Thu, Mar 19

CDanis added a comment to T306550: Move dumps.wikimedia.org HTTP service behind CDN edge.

(FWIW, a major reason these connections are so long-lived is the rate limiting that's there to try to handle all of the traffic from so little infrastructure. Offloading at least the most frequently used files to the CDN might possibly help with that?)

Thu, Mar 19, 1:35 PM · Traffic, Datasets-General-or-Unknown, cloud-services-team, Data-Services

Mon, Mar 16

CDanis added a comment to T419734: RfC: Use of gRPC as Lambda interface for linked artifact caching.

I haven't played around with https://connectrpc.com/ at all, but it looks like it was designed with these exact concerns in mind, and could allow us to write and ship a fully-compliant gRPC server that also, with zero work additional required, would accept an easy-to-speak dialect of plain HTTP.

Mon, Mar 16, 5:13 PM · User-Eevans, Data-Persistence

Mar 9 2026

CDanis added a comment to T418377: Image Rate Limiting Issues For Future Audiences Project.

Hey Chris,

Perhaps we can talk live about this. I'm concerned about you mentioning that there will be no version of the limits that can facilitate an acceptable UX. I think we are hoping for 10 images/second if that's possible, but I don't want to make life difficult for all of you. I know you're dealing with a lot on the incident load front. We may end up just going live with the internal test as things stand, but would love to set up a time with you and my PM if that's possible.

Mar 9 2026, 2:51 PM · Traffic, SRE

Mar 6 2026

CDanis created P89821 Dockerfile.juice.
Mar 6 2026, 3:28 PM
CDanis created P89820 (An Untitled Masterwork).
Mar 6 2026, 3:24 PM
CDanis added a comment to T419154: User and site scripts are globally disabled.

Most user JS scripts should be working again.

Mar 6 2026, 12:07 AM · 2026-user-javascript-incident, Wikimedia-Incident, WMF-General-or-Unknown

Mar 5 2026

CDanis updated the task description for T419166: Audit production for systemd parse warnings.
Mar 5 2026, 9:40 PM · Infrastructure-Foundations, Patch-For-Review, Security, SRE
CDanis updated the task description for T419166: Audit production for systemd parse warnings.
Mar 5 2026, 9:31 PM · Infrastructure-Foundations, Patch-For-Review, Security, SRE
CDanis created T419166: Audit production for systemd parse warnings.
Mar 5 2026, 9:31 PM · Infrastructure-Foundations, Patch-For-Review, Security, SRE
CDanis closed T406990: FY25/26 WE4.3.2: support JA4H as Resolved.
Mar 5 2026, 9:21 PM · Hiddenparma, SRE
CDanis edited P89813 (An Untitled Masterwork).
Mar 5 2026, 8:39 PM
CDanis added a comment to T418377: Image Rate Limiting Issues For Future Audiences Project.

Apologies for the conflicting information, that's partially my fault.

Mar 5 2026, 6:56 PM · Traffic, SRE
CDanis updated subscribers of T418377: Image Rate Limiting Issues For Future Audiences Project.

The app involves fetching a large number of apps to present to the user.

Mar 5 2026, 4:55 PM · Traffic, SRE

Feb 25 2026

CDanis added a project to T418381: Wikimedia Status displays incomplete graphs (cuts off 2026-02-25 05:35 UTC) or no data: observability.
Feb 25 2026, 2:48 PM · observability, Incident Tooling

Feb 23 2026

CDanis lowered the priority of T240495: investigate making 'notrack' the default on our ferm rules from Medium to Low.
Feb 23 2026, 3:37 PM · Infrastructure-Foundations, SRE

Feb 19 2026

CDanis closed T417934: puppetserver1002 /srv/git/operations/private out of sync as Resolved.
Feb 19 2026, 8:24 PM · Puppet, SRE
CDanis added a comment to T417934: puppetserver1002 /srv/git/operations/private out of sync.
💙root@puppetserver1002.eqiad.wmnet /srv/git/operations/private 🕒🙃 git reset --hard origin/master
HEAD is now at ce722766 (herron) dummy commit to resync logstash collector yaml
Feb 19 2026, 8:12 PM · Puppet, SRE
CDanis triaged T417900: Serve something helpful at metamonitoring.wikimedia.org as High priority.

tagging High because it is so low-effort

Feb 19 2026, 3:10 PM · Observability-Alerting
CDanis created T417900: Serve something helpful at metamonitoring.wikimedia.org.
Feb 19 2026, 3:09 PM · Observability-Alerting
CDanis added a comment to T417087: Deploy Qwen3-Reranker-0.6B inference service for semantic search reranking.

We have used llama-server on our relforge instances (dual socket xeon silver w/ 12 cores each) to provide reranking in our prototype via a custom Qwen3-Reranker-0.6B-Q8_0.gguf but the performance is borderline unacceptable (2-3s for n=10). Q8 was used as it provided a significant improvement to inference latency over the unquantized model.

Feb 19 2026, 1:48 PM · Discovery-Search (2026.04.06 - 2026.05.01), Semantic Search, CirrusSearch

Feb 18 2026

CDanis added a comment to T303725: Extend NEL headers to sites not fronted by CDN.

Sounds good to me.

Feb 18 2026, 7:09 PM · Patch-For-Review, collaboration-services, Infrastructure-Foundations, SRE

Feb 11 2026

CDanis updated the title for P88785 CIDERGRINDER spur stats --bloom-factor 3 --granularity 64 from untitled to CIDERGRINDER spur stats --bloom-factor 3 --granularity 64.
Feb 11 2026, 8:47 PM
CDanis edited P88785 CIDERGRINDER spur stats --bloom-factor 3 --granularity 64.
Feb 11 2026, 8:44 PM
CDanis edited P88785 CIDERGRINDER spur stats --bloom-factor 3 --granularity 64.
Feb 11 2026, 8:44 PM
CDanis created P88785 CIDERGRINDER spur stats --bloom-factor 3 --granularity 64.
Feb 11 2026, 8:43 PM
CDanis added a comment to T416950: CiviCRM is connected to MediaWiki.

In particular it'd be good to know if you only care about encryption going over the wire, or if encryption at rest is also necessary.

Feb 11 2026, 7:04 PM · GNU England Shaker dresser, fr-current-sprint, Fundraising-Backlog, FY25-26 WE3.5 Donor Identification and recognition
CDanis closed T417156: Pushing to gerrit over http is blocked by generic rate limiting, a subtask of T411895: gerrit behind CDN, as Resolved.
Feb 11 2026, 6:48 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis closed T417156: Pushing to gerrit over http is blocked by generic rate limiting as Resolved.
Feb 11 2026, 6:47 PM · Gerrit, collaboration-services
CDanis added a comment to T416540: Mean MediaWiki backend latency increased by 60% between October and December 2025.

The latency trend for this past ~week is even worse:

image.png (1×2 px, 665 KB)

Feb 11 2026, 6:04 PM · Parsoid-Read-Views (Performance), User-jijiki, ServiceOps-Mediawiki, Performance Issue, MediaWiki-Platform-Team (Radar)
CDanis added a comment to T417156: Pushing to gerrit over http is blocked by generic rate limiting.

Thanks for filing that @Tgr !

Feb 11 2026, 5:33 PM · Gerrit, collaboration-services
CDanis added a comment to T417156: Pushing to gerrit over http is blocked by generic rate limiting.

This should be fixed now -- I did some quick checks but further confirmation appreciated :)

Feb 11 2026, 5:19 PM · Gerrit, collaboration-services

Feb 6 2026

CDanis added a comment to T416540: Mean MediaWiki backend latency increased by 60% between October and December 2025.

Parsoid looks to be consuming basically all the CPU that mw-jobrunners have to offer:

Feb 6 2026, 8:53 PM · Parsoid-Read-Views (Performance), User-jijiki, ServiceOps-Mediawiki, Performance Issue, MediaWiki-Platform-Team (Radar)
CDanis changed the visibility for F71712049: image.png.
Feb 6 2026, 8:47 PM
CDanis added a comment to T416171: s2 primary master getting reads?.

Looks like it lines up with the train deployment of MediaWiki 1.45/wmf.6 ?

Feb 6 2026, 8:42 PM · MW-1.46-notes (1.46.0-wmf.15; 2026-02-10), Growth-Team, GrowthExperiments, Data-Persistence
CDanis changed the visibility for F71711861: image.png.
Feb 6 2026, 8:37 PM
CDanis changed the visibility for F71711809: image.png.
Feb 6 2026, 8:28 PM
CDanis changed the visibility for F71711258: image.png.
Feb 6 2026, 7:12 PM
CDanis changed the visibility for F71710323: image.png.
Feb 6 2026, 7:03 PM
CDanis added a comment to T416171: s2 primary master getting reads?.

It isn't just s2. It's all of them. Since the switchover.

Feb 6 2026, 7:02 PM · MW-1.46-notes (1.46.0-wmf.15; 2026-02-10), Growth-Team, GrowthExperiments, Data-Persistence
CDanis added a comment to T414719: Opt-in testing of Gerrit-via-CDN.

I found the request:
https://logstash.wikimedia.org/goto/78f43ba4e27abb0fa03ad86fcae79dfc

Feb 6 2026, 1:33 PM · Gerrit, collaboration-services

Feb 5 2026

CDanis added a comment to T416452: Migrate Docker images running in Production away from Bullseye.

@CDanis Hi! I saw your name for otelcol and this is why I am reaching out :) IIUC it is a golang binary so it should be ok to create a component for trixie-wikimedia and copy over the otelcol-contrib package right? Or should we follow another process? I know there is a bookworm variant but I'd love to target stable if possible. Lemme know!

Feb 5 2026, 12:59 PM · User-Eevans, Data-Platform-SRE, User-Elukey, Epic, Infrastructure-Foundations, SRE

Feb 3 2026

CDanis updated the language for P88557 (An Untitled Masterwork) from autodetect to go.
Feb 3 2026, 10:14 PM
CDanis created P88557 (An Untitled Masterwork).
Feb 3 2026, 10:14 PM
CDanis created P88546 (An Untitled Masterwork).
Feb 3 2026, 9:05 PM

Jan 29 2026

CDanis added a comment to T414719: Opt-in testing of Gerrit-via-CDN.

@CDanis do you think it makes sense to offer a /etc/hosts-solution as well in the task description? So something like "use tunnnelencabulator or change you etc hosts to the nearest address of gerrit-lb.<dc>.wikimedia.org" with a few examples. That might make it easier for volunteers to try it without installing tunnelencabulator. I'm happy to update the description.

Jan 29 2026, 12:49 PM · Gerrit, collaboration-services

Jan 23 2026

CDanis added a comment to T278056: Upgrade Druid to version 26.0.0.

The heap size didn't seem to be under pressure, so I am wondering if it is just something related to queries (maybe with regexes) that allocate a ton of object with small life, hitting some constraints like small young gen etc.. Since we use G1 now, we could try to increase something like -XX:G1MaxNewSizePercent and see how it goes.

Jan 23 2026, 8:42 PM · Data-Platform-SRE (2026-03-06 - 2026-03-27), Essential-Work

Jan 20 2026

CDanis added a comment to T278056: Upgrade Druid to version 26.0.0.

Totally fine from SRE's side, just check in with the current oncallers before you begin the disruptive part of the maintenance.

Jan 20 2026, 2:14 PM · Data-Platform-SRE (2026-03-06 - 2026-03-27), Essential-Work

Jan 16 2026

CDanis reopened T249663: write some recording rules for queries used in the appserver RED k8s dashboard as "Open".

The appservers RED k8s dashboard makes even heavier queries, and was the trigger of a Thanos outage this week.

Jan 16 2026, 4:41 PM · Patch-For-Review, Observability-Metrics, SRE Observability (FY2025/2026-Q3), Prod-Kubernetes, ServiceOps new, SRE

Jan 15 2026

CDanis added a comment to T414719: Opt-in testing of Gerrit-via-CDN.

I think this may just be a quirk of tunnelencabulator, but when I run it and use ssh gerrit -- gerrit show-connections --wide I see myself entering via IPv6 and not an edge proxy. For me ssh -4 is necessary to route traffic over the tunnel.

Jan 15 2026, 9:29 PM · Gerrit, collaboration-services
CDanis added a comment to T414719: Opt-in testing of Gerrit-via-CDN.

When I do ssh gerrit.wikimedia.org -p 29418 gerrit show-connections --wide I'm seeing 4 "connections" per PoP without any associated user (confirmed with ss on the host). They seem to be short-lived connections. I guess this is some kind of tcp probe? Is this some kind of health-check behavior?

Jan 15 2026, 8:29 PM · Gerrit, collaboration-services
CDanis added a comment to T414719: Opt-in testing of Gerrit-via-CDN.

Q: How long should I leave this running?
A: Up to one whole workday at a time. Don't set it and totally forget it -- it's directing you towards one of our edge sites, and that will need to change if we need to depool that edge site.

Jan 15 2026, 8:07 PM · Gerrit, collaboration-services
CDanis claimed T414719: Opt-in testing of Gerrit-via-CDN.
Jan 15 2026, 7:22 PM · Gerrit, collaboration-services
CDanis updated the task description for T414719: Opt-in testing of Gerrit-via-CDN.
Jan 15 2026, 7:17 PM · Gerrit, collaboration-services
CDanis created T414719: Opt-in testing of Gerrit-via-CDN.
Jan 15 2026, 7:14 PM · Gerrit, collaboration-services
CDanis updated the task description for T411895: gerrit behind CDN.
Jan 15 2026, 6:57 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis added a comment to T411895: gerrit behind CDN.

💙cdanis@wmftop ~ 🕙☕ DC=magru ; curl -I -X GET https://gerrit.wikimedia.org --connect-to ::gerrit-lb.${DC}.wikimedia.org ; nc -vW1 gerrit-lb.${DC}.wikimedia.org 29418
HTTP/2 302
date: Thu, 15 Jan 2026 15:10:29 GMT
server: Apache
location: https://gerrit.wikimedia.org/r/
content-length: 215
content-type: text/html; charset=iso-8859-1
age: 1
vary: X-Forwarded-Proto
x-cache: cp7008 miss, cp7008 pass
x-cache-status: pass
server-timing: cache;desc="pass", host;desc="cp7008"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Uniq=lduuB793lSeIP6Ao8KGhOQLpAAAAAFvdmXX8iIiZ92DS7Rg0aVcUoUhHLVdOEKDe;Domain=gerrit.wikimedia.org;Path=/;HttpOnly;secure;SameSite=None;Expires=Fri, 15 Jan 2027 00:00:00 GMT
x-request-id: 2f56b390-8ac2-49e6-9794-6de1c5215d95

Jan 15 2026, 3:13 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis added a comment to T414460: Socket leaking on some dse-k8s row C & D hosts.

My assumption is that this is more likely related to the cephfs interface, than to the rbd interface.
That's because we mainly use the block devices for postgresql, which doesn't have a high pod churn rate like the airflow tasks.

Jan 15 2026, 2:26 PM · Data-Platform-SRE (2026-03-06 - 2026-03-27), Essential-Work, SRE, Infrastructure-Foundations, netops

Jan 14 2026

CDanis added a comment to T411895: gerrit behind CDN.

🎉 the ssh port isn't yet working, but https is!

Jan 14 2026, 9:28 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis added a comment to T414621: Make puppet-compiler execution run with higher priority, not like other 'experimental' jobs.

Sure thing. This changeid & these patchsets both spent several minutes minutes showing "queued" on https://integration.wikimedia.org/zuul/ before the Jenkins job seemed to begin

Jan 14 2026, 8:13 PM · Patch-For-Review, Continuous-Integration-Infrastructure, Zuul, Continuous-Integration-Config, Infrastructure-Foundations, Puppet CI, Release-Engineering-Team
CDanis created T414621: Make puppet-compiler execution run with higher priority, not like other 'experimental' jobs.
Jan 14 2026, 7:21 PM · Patch-For-Review, Continuous-Integration-Infrastructure, Zuul, Continuous-Integration-Config, Infrastructure-Foundations, Puppet CI, Release-Engineering-Team
CDanis updated the task description for T411895: gerrit behind CDN.
Jan 14 2026, 7:18 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis added a comment to T414460: Socket leaking on some dse-k8s row C & D hosts.

The k8s host sent a FIN to the remote side but due to the packet-loss issue the remote side didn't get it, or it did and the ACK for it wasn't received. Which explains it not moving to FIN-WAIT-2, however surely it should try to resend the FIN, and if this state persists eventually just delete the connection?

Jan 14 2026, 2:33 PM · Data-Platform-SRE (2026-03-06 - 2026-03-27), Essential-Work, SRE, Infrastructure-Foundations, netops

Jan 13 2026

CDanis added a comment to T414460: Socket leaking on some dse-k8s row C & D hosts.

I took a quick look at the state of sockets on dse-k8s-worker1010, since FIN_WAIT_1 is not supposed to stick around for longer than a minute or two. Increasingly-conplicated ss flags showed that there's a busy: field available, reporting a number of milliseconds. This lets us do some dating on the sockets that are still around:

Jan 13 2026, 7:45 PM · Data-Platform-SRE (2026-03-06 - 2026-03-27), Essential-Work, SRE, Infrastructure-Foundations, netops

Dec 17 2025

CDanis added a comment to T413008: Cross-datacenter Docker Registry replication broken since 2025-04-27.

This is at least High and possibly UBN!

Dec 17 2025, 7:56 PM · Release-Engineering-Team (Radar), SRE, Infrastructure-Foundations, serviceops-deprecated, SRE-swift-storage, Kubernetes
CDanis created T413008: Cross-datacenter Docker Registry replication broken since 2025-04-27.
Dec 17 2025, 7:55 PM · Release-Engineering-Team (Radar), SRE, Infrastructure-Foundations, serviceops-deprecated, SRE-swift-storage, Kubernetes

Dec 5 2025

CDanis updated the task description for T411895: gerrit behind CDN.
Dec 5 2025, 8:42 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis updated the task description for T411895: gerrit behind CDN.
Dec 5 2025, 8:23 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis updated the task description for T411895: gerrit behind CDN.
Dec 5 2025, 8:18 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis updated the task description for T411895: gerrit behind CDN.
Dec 5 2025, 8:03 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis updated the task description for T411895: gerrit behind CDN.
Dec 5 2025, 7:56 PM · Patch-For-Review, Gerrit, collaboration-services
CDanis created T411895: gerrit behind CDN.
Dec 5 2025, 7:50 PM · Patch-For-Review, Gerrit, collaboration-services

Dec 3 2025

CDanis added a comment to T411503: x-provenance header: identify WMCS.

I don't think we currently have any places outside of https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_IP_space that publish our IP space. Would it be helpful if we published the same information in some machine-readable format?

Probbly... maybe this could use the same mechanism we use for "know clients" like googlebot? Curious what @CDanis thinks.

Dec 3 2025, 1:44 AM · Patch-For-Review, Traffic

Dec 1 2025

CDanis removed a project from T411250: rest gateway: Record x-trusted-request and x-provenance headers in access logs: Infrastructure-Foundations.
Dec 1 2025, 3:29 PM · serviceops-deprecated, OKR-Work
CDanis added a comment to T411250: rest gateway: Record x-trusted-request and x-provenance headers in access logs.

Question: are there any restrictions about recording this information in logs for some time (e.g. 90 days)? The same log would also include the client's IP address. It may also contain the user name or user ID of authenticated users in certain cases.

Dec 1 2025, 3:29 PM · serviceops-deprecated, OKR-Work

Nov 21 2025

CDanis added a comment to T400119: Block traffic from user-agents not honoring our policy.

This seems to be blocking legitimate web captures by the Internet Archive (see, for example, here). Is this actually the intended outcome?

The Internet Archive is a reliable source for viewing old Wikipedia articles exactly as they appeared at the time of capture, in a way that the Revision History feature does not allow (for example, it does not show old revisions of transcluded templates). It is also used to record the history of some special pages that do not normally have their histories saved on Wikipedia — such as Special:Tags and Special:GadgetUsage.

I'm afraid that the complete blockage of Internet Archive captures of all wiki pages is unfortunate, and may cause other legitimate features and services that rely on such captures to cease functioning.

Nov 21 2025, 3:47 PM · User-notice-archive, Patch-For-Review, Traffic, SRE

Nov 18 2025

CDanis added a comment to T405945: eqiad row C/D Infrastructure Foundations host migrations.

@RobH please proceed at your convenience -- these two hosts are not in active service.

Nov 18 2025, 8:32 PM · Infrastructure-Foundations, SRE, DC-Ops, ops-eqiad

Nov 10 2025

CDanis closed T406243: Requesting access to deployment for VolkerE as Resolved.
Nov 10 2025, 7:55 PM · SRE, SRE-Access-Requests
CDanis closed T409600: New SSH key for Brett Cornwall as Resolved.

merged and fast-deployed to A:bastion OR A:cumin

Nov 10 2025, 5:21 PM · SRE, SRE-Access-Requests
CDanis added a comment to T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE.

Hi @Chandra-WMDE , seems like you posted the private key in the task instead of the public. Please stop using that key for anything, and generate a new one, and then we can get you sorted with access.

Nov 10 2025, 5:02 PM · Data-Engineering, SRE, SRE-Access-Requests

Nov 7 2025

CDanis updated CDanis.
Nov 7 2025, 8:44 PM
CDanis updated CDanis.
Nov 7 2025, 6:07 PM
CDanis closed T408064: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy), a subtask of T408532: Deploy a TCP proxy across all DCs, as Resolved.
Nov 7 2025, 6:00 PM · Release-Engineering-Team (Radar), Traffic, collaboration-services, SRE
CDanis closed T408064: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) as Resolved.
Nov 7 2025, 6:00 PM · Patch-For-Review, collaboration-services, vm-requests, Infrastructure-Foundations, SRE
CDanis added a comment to T408064: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy).

On trixie, the attempt to read the v6 address from the qemu variables in late_command.sh isn't working, and so the host gets configured for SLAAC instead (shown in P85100 from /var/log/installer/syslog). Which doesn't work at all.

Nov 7 2025, 5:59 PM · Patch-For-Review, collaboration-services, vm-requests, Infrastructure-Foundations, SRE
CDanis updated the title for P85100 late_command.sh, trixie, routed ganeti from late_install.sh, trixie, routed ganeti to late_command.sh, trixie, routed ganeti.
Nov 7 2025, 4:10 PM
CDanis updated the title for P85100 late_command.sh, trixie, routed ganeti from Masterwork From Distant Lands to late_install.sh, trixie, routed ganeti.
Nov 7 2025, 4:05 PM

Nov 5 2025

CDanis added a comment to T409183: Enable a single-sign-on reverse proxy for GrowthBook.

What I have in mind is to deploy oauth2-proxy behind envoyproxy.

Nov 5 2025, 8:24 PM · OKR-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28)
CDanis added a comment to T405808: Upgrade Envoy to v1.32.12.

Tracing updates section LGTM, no config changes needed. Thanks!

Nov 5 2025, 7:08 PM · SRE, serviceops-deprecated, envoy

Oct 31 2025

CDanis removed a project from T407706: Global block exception for AddDesc app: bot-traffic-requests.
Oct 31 2025, 4:15 PM · Traffic