hey look at this graph 📈
User Details
- User Since
- Nov 5 2018, 2:54 PM (388 w, 1 d)
- Availability
- Available
- IRC Nick
- cdanis
- LDAP User
- CDanis
- MediaWiki User
- CDanis (WMF) [ Global Accounts ]
Mon, Apr 13
thanks for the report, sorry for missing this as part of T414486
Live-patched in production, service is restored.
Sounds a lot like T248872: puppet-merge lockout/tagout ?
Wed, Apr 1
Tue, Mar 31
This should be working now:
deployment.eqiad.wmnet
user fundraising-data-uploader
/var/lib/fundraising-data-uploader (the user's homedir)
Mon, Mar 30
Apologies, I'll write a host firewall patch tomorrow morning.
Wed, Mar 25
Thanks Joseph <3
Mon, Mar 23
Sat, Mar 21
Fri, Mar 20
+1 from me on making UUIDv4 the only codepath
Thu, Mar 19
Mon, Mar 16
I haven't played around with https://connectrpc.com/ at all, but it looks like it was designed with these exact concerns in mind, and could allow us to write and ship a fully-compliant gRPC server that also, with zero work additional required, would accept an easy-to-speak dialect of plain HTTP.
Mar 9 2026
Mar 6 2026
Most user JS scripts should be working again.
Mar 5 2026
Apologies for the conflicting information, that's partially my fault.
The app involves fetching a large number of apps to present to the user.
Feb 25 2026
Feb 23 2026
Feb 19 2026
💙root@puppetserver1002.eqiad.wmnet /srv/git/operations/private 🕒🙃 git reset --hard origin/master HEAD is now at ce722766 (herron) dummy commit to resync logstash collector yaml
tagging High because it is so low-effort
We have used llama-server on our relforge instances (dual socket xeon silver w/ 12 cores each) to provide reranking in our prototype via a custom Qwen3-Reranker-0.6B-Q8_0.gguf but the performance is borderline unacceptable (2-3s for n=10). Q8 was used as it provided a significant improvement to inference latency over the unquantized model.
Feb 18 2026
Sounds good to me.
Feb 11 2026
In particular it'd be good to know if you only care about encryption going over the wire, or if encryption at rest is also necessary.
The latency trend for this past ~week is even worse:
Thanks for filing that @Tgr !
This should be fixed now -- I did some quick checks but further confirmation appreciated :)
Feb 6 2026
Parsoid looks to be consuming basically all the CPU that mw-jobrunners have to offer:
Looks like it lines up with the train deployment of MediaWiki 1.45/wmf.6 ?
It isn't just s2. It's all of them. Since the switchover.
I found the request:
https://logstash.wikimedia.org/goto/78f43ba4e27abb0fa03ad86fcae79dfc
Feb 5 2026
Feb 3 2026
Jan 29 2026
Jan 23 2026
Jan 20 2026
Totally fine from SRE's side, just check in with the current oncallers before you begin the disruptive part of the maintenance.
Jan 16 2026
The appservers RED k8s dashboard makes even heavier queries, and was the trigger of a Thanos outage this week.
Jan 15 2026
Q: How long should I leave this running?
A: Up to one whole workday at a time. Don't set it and totally forget it -- it's directing you towards one of our edge sites, and that will need to change if we need to depool that edge site.
💙cdanis@wmftop ~ 🕙☕ DC=magru ; curl -I -X GET https://gerrit.wikimedia.org --connect-to ::gerrit-lb.${DC}.wikimedia.org ; nc -vW1 gerrit-lb.${DC}.wikimedia.org 29418
HTTP/2 302
date: Thu, 15 Jan 2026 15:10:29 GMT
server: Apache
location: https://gerrit.wikimedia.org/r/
content-length: 215
content-type: text/html; charset=iso-8859-1
age: 1
vary: X-Forwarded-Proto
x-cache: cp7008 miss, cp7008 pass
x-cache-status: pass
server-timing: cache;desc="pass", host;desc="cp7008"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Uniq=lduuB793lSeIP6Ao8KGhOQLpAAAAAFvdmXX8iIiZ92DS7Rg0aVcUoUhHLVdOEKDe;Domain=gerrit.wikimedia.org;Path=/;HttpOnly;secure;SameSite=None;Expires=Fri, 15 Jan 2027 00:00:00 GMT
x-request-id: 2f56b390-8ac2-49e6-9794-6de1c5215d95
Jan 14 2026
🎉 the ssh port isn't yet working, but https is!
Sure thing. This changeid & these patchsets both spent several minutes minutes showing "queued" on https://integration.wikimedia.org/zuul/ before the Jenkins job seemed to begin
Jan 13 2026
I took a quick look at the state of sockets on dse-k8s-worker1010, since FIN_WAIT_1 is not supposed to stick around for longer than a minute or two. Increasingly-conplicated ss flags showed that there's a busy: field available, reporting a number of milliseconds. This lets us do some dating on the sockets that are still around:
Dec 17 2025
This is at least High and possibly UBN!
Dec 5 2025
Dec 3 2025
Dec 1 2025
Question: are there any restrictions about recording this information in logs for some time (e.g. 90 days)? The same log would also include the client's IP address. It may also contain the user name or user ID of authenticated users in certain cases.
Nov 21 2025
Nov 18 2025
@RobH please proceed at your convenience -- these two hosts are not in active service.
Nov 10 2025
merged and fast-deployed to A:bastion OR A:cumin
Hi @Chandra-WMDE , seems like you posted the private key in the task instead of the public. Please stop using that key for anything, and generate a new one, and then we can get you sorted with access.
Nov 7 2025
On trixie, the attempt to read the v6 address from the qemu variables in late_command.sh isn't working, and so the host gets configured for SLAAC instead (shown in P85100 from /var/log/installer/syslog). Which doesn't work at all.
Nov 5 2025
Tracing updates section LGTM, no config changes needed. Thanks!
