User Details
- User Since
- Oct 7 2014, 6:35 PM (592 w, 5 d)
- Availability
- Available
- IRC Nick
- dr0ptp4kt
- LDAP User
- Unknown
- MediaWiki User
- ABaso (WMF) [ Global Accounts ]
Fri, Feb 6
Jan 13 2026
Jan 7 2026
Cross-linking an item discussed in chat from before the break: T413275: Data instruments failing to hoist Sticky Headers experiment setup
Going by DC is this isolated on desktop in a given DC? That's maybe an obvious and expected thing based upon the most active (or inactive) DC. I suspect the stuff around the 30th of December is related to some natural (or automated) spike, but more looking at the general trends here.
Jan 6 2026
Dec 20 2025
Code review posted at https://github.com/wikimedia/apps-android-wikipedia/pull/6179#pullrequestreview-3601109884 . Thanks @Dbrant for the port, tests, and more, including the extensive renaming.
Dec 18 2025
Dec 17 2025
Just a heads up I'm going to attempt silencing the warning until after the holiday break, mainly so that people are getting no more messages than strictly necessary. I'll be OoO after Friday, and back after the holiday break. We're seesawing around the 99.9% SLO so are in okay shape (although higher would be better, of course!). @RLazarus I'll loop you on a related IM where I'm trying to dig just a little more into whether there are perhaps additional diagnostic pieces to consider.
Thank you, @tappof .
Seems to work with a read-only key:
Dec 16 2025
It's not supposed to generate experiment events (Experiment.js#228, OverriddenExperiment.php#22), and a little spot checking with Chrome and Safari with a mobile UA and ?mpo=sticky-headers:treatment&useformat=mobile for the query string on frwiki where the sticky-headers edge-unique does not send events (as expected it gives the "The enrolment for this experiment has been overridden. The folloiwng event will not be sent" message in the console in addition to no network events recorded for the experiment route; regular intake-analytics events still going, of course).
We deleted stuff. Marking as Resolved.
Dec 12 2025
Oh, I see it updates the task description. Neat. I'd seen phaultfinder before but didn't understand its behavior beyond filing a ticket.
@RLazarus possible to add something to the subject line to make these sufficiently unique? I was thinking the slo field could go into the title. And then does Phaultfinder just add new entries as a new comment or something? I'm wondering if this would be something of a perma-ticket Experiment Platform would want to just drag from sprint to sprint and keep open but drag around on the board whenever work needs to be done to analyze what's going on, and possibly further action.
Moving to Done, in favor of T412467: ErrorBudgetBurn (sticky-headers, part 2)
The SLO on the Test Kitchen EventGate side of things is currently projected to be at risk, but errors do, as @cjming noted on chat seem to have tapered off considerably (we saw initial error budget alerting via T412448: ErrorBudgetBurn (sticky-headers) but as discussed there a backport was pushed to fix the main problem).
Dec 11 2025
Appears to be related to sticky-headers experiment. Reader Growth has temporarily turned this Off, and a backport will be arriving via T412146: Launch Mobile Expanded Sections on non-English wikis with https://gerrit.wikimedia.org/r/1217576 at T412146#11452738.
Dec 10 2025
Thanks @Sfaci. wmftkbot is now pointed at test-kitchen.wikimedia.org via its local config.yaml and a restart of the tk tool's continuous job on Toolforge, and it looks to be in working order.
Dec 8 2025
Dec 6 2025
Dec 5 2025
Posting here as the most convenient place. @brouberol LMK in case I ought to post to a separate ticket.
Dec 4 2025
Probably both RelEng (along with @bd808, who was in the meeting with @thcipriani who helped facilitate a meeting earlier) and ExP, I'm thinking. ExP should probably be the one to introduce the TK configuration and a patch (or patches) that result(s) in creating the sort of noise people are likely to see where they'd need to go tracking down things. I'll reach out to Tyler about setting some more time to discuss approach.
Dec 3 2025
mpic.discovery.wmnet is showing as non-reachable for ping as well as for a request such as, for example, https://mpic.discovery.wmnet:30443/api/v1/experiments?authority=varnish&format=config, from the tool bastion under tool user tk (the ID used for wmftkbot). Also, it appears if trying to run it via the k8s infrastructure that the jobs run from, same unreachability issue.
Dec 2 2025
I think this was a warning that this SLO was at risk based on a shorter window burn rate (6 hours and 4 days are in the corresponding queries generated through the system; this was a quiet time for pertinent events, but in the days prior there were events with errors), although I'm not clear that the error budget here is at risk for the full 90 day window. The current 4 week window at slo.wikimedia.org seems okay (it has available error budget and is green), which gives a rough idea on how we might be looking if projecting for 90 days.
Dec 1 2025
Nov 26 2025
The EventGate validation errors dashboard is showing a decent chunk of these sorts of errors:
Nov 25 2025
Nov 24 2025
@Seppl2013 (other Adam here, big fan of @Addshore's works :) ) - you'll probably want to use wget instead of curl, as it tends to be more reliable. I wrote up some notes at https://techblog.wikimedia.org/2025/04/08/wikidata-query-service-graph-database-reload-at-home-2025-edition/ for some pieces of this, after learning of various approaches folks have taken in the past (including the very cool approach of @Addshore) and having spent some time staring at this sort of challenge.
Nov 22 2025
Nov 18 2025
Nov 12 2025
Following up on Meet with @elukey today, here are the suggested alerting targets for the SLOs: