Page MenuHomePhabricator

RLazarus (Reuven Lazarus) (rzl)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Oct 15 2019, 4:02 PM (320 w, 5 d)
Availability
Available
IRC Nick
rzl
LDAP User
RLazarus
MediaWiki User
RLazarus (WMF) [ Global Accounts ]

Recent Activity

Wed, Dec 3

RLazarus renamed T410975: Upgrade Envoy to v1.35.7 from Upgrade Envoy to v1.35.6 to Upgrade Envoy to v1.35.7.
Wed, Dec 3, 11:44 PM · SRE, serviceops, envoy
RLazarus added a comment to T410975: Upgrade Envoy to v1.35.7.

Envoy 1.35.7 is about to come out, with security fixes: https://groups.google.com/g/envoy-announce/c/zr2OzwmJFqY

Wed, Dec 3, 11:43 PM · SRE, serviceops, envoy

Mon, Dec 1

RLazarus added a comment to T411411: Allow dash-suffixes for chart versions.

The conversation in #wikimedia-serviceops when this was raised:

Mon, Dec 1, 8:38 PM · serviceops

Wed, Nov 26

RLazarus added a project to T410933: Add Druid as a Private Grafana Datasource: Observability-Metrics.

(Clinic duty here! Apparently a milestone tag, like SRE Observability (FY2025/2026-Q3), is mutually exclusive with the project tag, like SRE Observability, and that means the task shows up on the clinic duty dashboard as "needs triage." I'm adding Observability-Metrics at a guess, because that also takes it off the triage list, but if you'll be using those milestone tags going forward, we may want to adjust the clinic duty dashboard query.)

Wed, Nov 26, 6:09 PM · Observability-Metrics, SRE Observability (FY2025/2026-Q3), SRE
RLazarus closed T410972: Requesting access to cassandra-staging-devs group for amastilovic as Resolved.

@Ahoelzl @KOfori Thanks both!

Wed, Nov 26, 5:54 PM · SRE, SRE-Access-Requests
RLazarus updated the task description for T410972: Requesting access to cassandra-staging-devs group for amastilovic.
Wed, Nov 26, 5:30 PM · SRE, SRE-Access-Requests

Tue, Nov 25

RLazarus created T411058: machinetranslation eqiad pods in state ContainerStatusUnknown.
Tue, Nov 25, 11:11 PM · LPL Projects (Other), Unplanned-Sprint-Work, LPL Essential (FY2025-26 Q2), MinT, Prod-Kubernetes, serviceops, SRE
RLazarus moved T410972: Requesting access to cassandra-staging-devs group for amastilovic from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Tue, Nov 25, 6:11 PM · SRE, SRE-Access-Requests
RLazarus closed T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE as Resolved.

Added to nda:

rzl@ldap-maint1001:~$ ldapsearch -x cn=nda | grep chandra-wmde
member: uid=chandra-wmde,ou=people,dc=wikimedia,dc=org
Tue, Nov 25, 6:10 PM · Data-Engineering, SRE, SRE-Access-Requests
RLazarus updated the task description for T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE.
Tue, Nov 25, 6:02 PM · Data-Engineering, SRE, SRE-Access-Requests
RLazarus added a comment to T410426: Requesting access to analytics-privatedata-users for dsmit.

Oh, and: On top of L3 which you've already read, please ensure you're also familiar with https://wikitech.wikimedia.org/wiki/Data_Platform/Data_access#User_responsibilities and reach out if you have any questions. Thanks!

Tue, Nov 25, 5:58 PM · SRE, SRE-Access-Requests
RLazarus closed T410426: Requesting access to analytics-privatedata-users for dsmit as Resolved.

This is complete -- please allow up to 30 minutes for it to take effect, then you should be all set! If you still have any trouble, feel free to reopen the task or file a new one.

Tue, Nov 25, 5:47 PM · SRE, SRE-Access-Requests
RLazarus added a comment to T410972: Requesting access to cassandra-staging-devs group for amastilovic.

Hi, this week's clinic duty SRE here.

Tue, Nov 25, 5:42 PM · SRE, SRE-Access-Requests
RLazarus updated the task description for T410972: Requesting access to cassandra-staging-devs group for amastilovic.
Tue, Nov 25, 5:41 PM · SRE, SRE-Access-Requests
RLazarus created T410975: Upgrade Envoy to v1.35.7.
Tue, Nov 25, 1:22 AM · SRE, serviceops, envoy
RLazarus added a project to T410944: Reboot cookbook workflow leaves Puppet disabled: SRE-tools.
Tue, Nov 25, 12:42 AM · Traffic, SRE-tools, Infrastructure-Foundations, SRE
RLazarus added a project to T410601: Improve "reuse" feature for standard partman recipes: Infrastructure-Foundations.
Tue, Nov 25, 12:42 AM · User-MoritzMuehlenhoff, Infrastructure-Foundations, SRE
RLazarus added a comment to T409707: Requesting access to Analytics_Privatedata for Chandra-WMDE.

@Milimetric @Ahoelzl Ping - can you approve for Data Engineering please? The requester is not a WMF or WMDE employee so this needs an explicit signoff.

Tue, Nov 25, 12:37 AM · Data-Engineering, SRE, SRE-Access-Requests

Mon, Nov 24

RLazarus closed T409409: Requesting access to analytics_privatedata_users and SQL Lab for Arian Bozorg (WMDE) as Resolved.

Optimistically resolving. :) @Arian_Bozorg please let us know if you have any trouble with your access, either by reopening this task or filing a new one.

Mon, Nov 24, 8:02 PM · SRE, SRE-Access-Requests
RLazarus updated the task description for T410426: Requesting access to analytics-privatedata-users for dsmit.
Mon, Nov 24, 7:44 PM · SRE, SRE-Access-Requests
RLazarus changed the status of T410426: Requesting access to analytics-privatedata-users for dsmit from Open to In Progress.

Followed up with @DSmit-WMF and confirmed level 1 is what we're doing. Implementation to follow.

Mon, Nov 24, 7:44 PM · SRE, SRE-Access-Requests

Fri, Nov 21

RLazarus closed T410767: repos/sre/sophroid is missing a software license as Resolved.
Fri, Nov 21, 11:52 PM · Software-Licensing, serviceops
RLazarus added a comment to T398869: Create Pyrra SLOs for xLab.

Alerts are enabled! Let's continue to monitor here a tiny bit longer, just in case they behave unexpectedly and the initial config needs tweaking -- but after a few days of finding it to be grossly working, we can declare victory, resolve this, and track any followup work separately.

Fri, Nov 21, 8:54 PM · SRE-SLO, Test Kitchen (Experiment Platform Sprint 14), OKR-Work
RLazarus updated subscribers of T410767: repos/sre/sophroid is missing a software license.
Fri, Nov 21, 7:40 PM · Software-Licensing, serviceops
RLazarus claimed T410767: repos/sre/sophroid is missing a software license.

Thanks @taavi.

Fri, Nov 21, 7:39 PM · Software-Licensing, serviceops

Wed, Nov 19

RLazarus added a comment to T410537: Add a --rack flag to sre.k8s.pool-depool-node.

(I'm not married to the specific CLI syntax in the example. Among other things, making it an --optional-flag means that the positional hosts argument would have to become optional too, which might be tricky. The argument might also have to restate the cluster name, something like eqiad-C5, if it can't be scraped out of --k8s-cluster. All that stuff is up to the implementer, IMHO -- as long as it's easier, I'm happy.)

Wed, Nov 19, 6:08 PM · Infrastructure-Foundations, SRE-tools, serviceops
RLazarus created T410537: Add a --rack flag to sre.k8s.pool-depool-node.
Wed, Nov 19, 6:04 PM · Infrastructure-Foundations, SRE-tools, serviceops

Nov 7 2025

RLazarus created T409510: Envoy config updates from v1.32.
Nov 7 2025, 12:19 AM · SRE, serviceops, envoy

Nov 6 2025

RLazarus created T409374: db1262 is down.
Nov 6 2025, 12:25 AM · SRE, DC-Ops, ops-eqiad, Sustainability (Incident Followup), DBA

Oct 30 2025

RLazarus closed T403663: Upgrade Envoy to v1.29.12, a subtask of T380211: Upgrade Envoy to >= 1.24, as Resolved.
Oct 30 2025, 8:46 PM · SRE, serviceops, envoy
RLazarus closed T403663: Upgrade Envoy to v1.29.12 as Resolved.
Oct 30 2025, 8:46 PM · Patch-For-Review, SRE, serviceops, envoy

Oct 28 2025

RLazarus closed T404036: Envoy config updates from v1.29 as Resolved.
Oct 28 2025, 8:48 PM · SRE, serviceops, envoy
RLazarus closed T404036: Envoy config updates from v1.29, a subtask of T403663: Upgrade Envoy to v1.29.12, as Resolved.
Oct 28 2025, 8:48 PM · Patch-For-Review, SRE, serviceops, envoy

Oct 27 2025

RLazarus added a comment to T405808: Upgrade Envoy to v1.32.12.

Testing this in mw-debug, there are two envoy warnings in the logs on startup:

Oct 27 2025, 10:15 PM · SRE, serviceops, envoy

Oct 21 2025

RLazarus added a comment to T407826: X-Request-Id response header off by 5000.

This is deployed to all services.

Oct 21 2025, 11:00 PM · serviceops, Traffic

Oct 18 2025

RLazarus added a comment to T406836: The Edit Check's SLO has burned all its error budget.

From conversation with @DLynch we think https://gerrit.wikimedia.org/r/1196940 addresses a possible underlying cause in EditCheck: if the model fetch takes so long that users abandon the page while it's underway, that will count against the SLO, since we already incremented Available but never increment Shown nor NotShown. The fix adds a 6-second timeout (down from 5 minutes, enforced elsewhere).

Oct 18 2025, 2:10 AM · OKR-Work, Goal, Editing-team (Planning), EditCheck

Oct 15 2025

RLazarus added a comment to T406836: The Edit Check's SLO has burned all its error budget.

This smells like a metrics issue to me -- note that on the rolling window dashboard, the "Errors" graph is regularly fluctuating between 0 and 50%, and occasionally past 100% (!) whereas in the calendar window dashboard, the "Error ratio" graph is steady between 0 and 0.03%. Those ought to be the same data, modulo the time window.

Oct 15 2025, 1:55 AM · OKR-Work, Goal, Editing-team (Planning), EditCheck

Oct 14 2025

RLazarus closed T406786: `sextant update` doesn't like growthbook as Resolved.

Thanks @brouberol! Confirming sextant update runs without issue now.

Oct 14 2025, 3:57 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops

Oct 8 2025

RLazarus updated the task description for T406786: `sextant update` doesn't like growthbook.
Oct 8 2025, 9:41 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops
RLazarus added a project to T406786: `sextant update` doesn't like growthbook: Data-Platform-SRE.
Oct 8 2025, 9:41 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops
RLazarus updated subscribers of T406786: `sextant update` doesn't like growthbook.

Oops, I see @brouberol is OOO for a bit. @RKemper can I talk you into taking a look?

Oct 8 2025, 9:40 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops
RLazarus updated subscribers of T406786: `sextant update` doesn't like growthbook.

This dates from https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1056137, where growthbook used to be one chart but was split into a frontend and a backend. That commit copied charts/growthbook/templates/vendor/... into charts/growthbook/charts/growthbook-{backend,frontend}/templates/vendor/... and removed the original. But it left the original package.json in place.

Oct 8 2025, 9:27 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops
RLazarus added a project to T406786: `sextant update` doesn't like growthbook: serviceops.
Oct 8 2025, 9:14 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops
RLazarus created T406786: `sextant update` doesn't like growthbook.
Oct 8 2025, 9:14 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), serviceops
RLazarus edited P83680 (An Untitled Masterwork).
Oct 8 2025, 8:58 PM
RLazarus created P83681 (An Untitled Masterwork).
Oct 8 2025, 8:53 PM
RLazarus created P83680 (An Untitled Masterwork).
Oct 8 2025, 8:51 PM
RLazarus claimed T406212: charlie wiped cluster redeployment use-case.
Oct 8 2025, 3:41 PM · Kubernetes, Prod-Kubernetes, serviceops

Oct 2 2025

RLazarus added a comment to T406212: charlie wiped cluster redeployment use-case.

Thanks for this! I hadn't originally thought about using charlie this way. For my use case (applying the same diff to every service, like an Envoy upgrade) the "just deploy everything without asking me" feature is tempting but also an obviously terrible idea, hence why it's deliberately not supported. But for your use case it's perfect.

Oct 2 2025, 6:30 PM · Kubernetes, Prod-Kubernetes, serviceops
RLazarus added a comment to T405703: Update wikikube eqiad to kubernetes 1.31.

Followup:

  • charlie feature request: Just do it and don't show me any diff or ask for confirmation

Noted! I was thinking about adding this, and hesitated just because you could really powerfully make mistakes with it (plus it's a brand-new tool I didn't trust yet) but good to know it would have helped here. Let me know if there's anything else while I'm at it, I think you're the only one who's tried the tool besides me.

Oct 2 2025, 4:12 PM · Discovery-Search (2025.09.26 - 2025.10.17), Data-Platform-SRE (2025.09.26 - 2025.10.17), Patch-For-Review, collaboration-services, Kubernetes, Prod-Kubernetes, serviceops

Oct 1 2025

RLazarus added a comment to T405703: Update wikikube eqiad to kubernetes 1.31.

Followup:

  • charlie feature request: Just do it and don't show me any diff or ask for confirmation
Oct 1 2025, 6:03 PM · Discovery-Search (2025.09.26 - 2025.10.17), Data-Platform-SRE (2025.09.26 - 2025.10.17), Patch-For-Review, collaboration-services, Kubernetes, Prod-Kubernetes, serviceops

Sep 26 2025

RLazarus created T405808: Upgrade Envoy to v1.32.12.
Sep 26 2025, 10:22 PM · SRE, serviceops, envoy

Sep 25 2025

RLazarus closed T403101: Envoy config updates from v1.26, a subtask of T402584: Upgrade Envoy to v1.26.8 and drop buster, as Resolved.
Sep 25 2025, 9:50 PM · Patch-For-Review, SRE, serviceops, envoy
RLazarus closed T403101: Envoy config updates from v1.26 as Resolved.
Sep 25 2025, 9:50 PM · SRE, serviceops, envoy
RLazarus closed T402584: Upgrade Envoy to v1.26.8 and drop buster as Resolved.

1.23 is gone. 🎉

Sep 25 2025, 9:34 PM · Patch-For-Review, SRE, serviceops, envoy
RLazarus closed T402584: Upgrade Envoy to v1.26.8 and drop buster, a subtask of T380211: Upgrade Envoy to >= 1.24, as Resolved.
Sep 25 2025, 9:34 PM · SRE, serviceops, envoy
RLazarus added a comment to T402584: Upgrade Envoy to v1.26.8 and drop buster.

I deployed most services in wikikube (in part to test https://gerrit.wikimedia.org/r/1188456). Remaining services with an Envoy upgrade to go:

Sep 25 2025, 1:13 AM · Patch-For-Review, SRE, serviceops, envoy

Sep 10 2025

RLazarus updated the task description for T404036: Envoy config updates from v1.29.
Sep 10 2025, 9:24 PM · SRE, serviceops, envoy
RLazarus updated the task description for T404036: Envoy config updates from v1.29.
Sep 10 2025, 5:36 PM · SRE, serviceops, envoy
RLazarus added a comment to T404036: Envoy config updates from v1.29.

And by request from @CDanis, adding to this config update cycle:

Sep 10 2025, 5:34 PM · SRE, serviceops, envoy

Sep 9 2025

RLazarus added a comment to T402584: Upgrade Envoy to v1.26.8 and drop buster.

@elukey Thank you! Looks like an ownership issue, and yes please if you're comfortable deploying those, I'll take you up on it. (We were just talking in serviceops about the general problem of keeping the state of the world up to date with the state of the repo. In the general case it's hard and we'll need to figure it out; in the specific case your help would make a big difference!)

Sep 9 2025, 3:49 PM · Patch-For-Review, SRE, serviceops, envoy
RLazarus updated the task description for T404036: Envoy config updates from v1.29.
Sep 9 2025, 1:39 AM · SRE, serviceops, envoy
RLazarus added a comment to T404036: Envoy config updates from v1.29.

The global_downstream_max_connections was deprecated in the 1.28 release notes, but as of 1.29, the downstream connections resource_monitor was still a work in progress. So we won't actually switch over to it until after 1.30.

Sep 9 2025, 1:39 AM · SRE, serviceops, envoy
RLazarus added a comment to T404036: Envoy config updates from v1.29.

Note the event_log_path comes up in mesh.configuration._tcp_cluster, which pulls in the entire health_checks field from values.yaml, so that's where the event_log_paths are set. We can either replace those event_log_paths with event_loggers right there in the values file (verbose, messy, wrong level of abstraction) or transform them in the template (cumbersome migration but better end state).

Sep 9 2025, 1:18 AM · SRE, serviceops, envoy
RLazarus created T404036: Envoy config updates from v1.29.
Sep 9 2025, 1:10 AM · SRE, serviceops, envoy
RLazarus updated the task description for T403101: Envoy config updates from v1.26.
Sep 9 2025, 12:47 AM · SRE, serviceops, envoy

Sep 6 2025

RLazarus updated the task description for T403663: Upgrade Envoy to v1.29.12.
Sep 6 2025, 12:56 AM · Patch-For-Review, SRE, serviceops, envoy

Sep 4 2025

RLazarus updated the task description for T403101: Envoy config updates from v1.26.
Sep 4 2025, 5:08 PM · SRE, serviceops, envoy

Sep 3 2025

RLazarus added a comment to T403663: Upgrade Envoy to v1.29.12.

Removed the tracing item

Sep 3 2025, 8:31 PM · Patch-For-Review, SRE, serviceops, envoy
RLazarus updated the task description for T403663: Upgrade Envoy to v1.29.12.
Sep 3 2025, 8:30 PM · Patch-For-Review, SRE, serviceops, envoy
RLazarus created T403663: Upgrade Envoy to v1.29.12.
Sep 3 2025, 8:04 PM · Patch-For-Review, SRE, serviceops, envoy

Aug 27 2025

RLazarus placed T301605: Central and South American countries in geo-maps up for grabs.
Aug 27 2025, 7:24 PM · DNS, Traffic
RLazarus closed T301605: Central and South American countries in geo-maps as Resolved.

LGTM, thank you for the work!

Aug 27 2025, 7:20 PM · DNS, Traffic
RLazarus created T403101: Envoy config updates from v1.26.
Aug 27 2025, 5:45 PM · SRE, serviceops, envoy

Aug 26 2025

RLazarus added a comment to T402584: Upgrade Envoy to v1.26.8 and drop buster.

More deprecation warnings from the API Gateway (started locally after modifying charts/api-gateway/values-devel.yaml to use envoy-future:

Aug 26 2025, 11:34 PM · Patch-For-Review, SRE, serviceops, envoy
RLazarus added a comment to T402584: Upgrade Envoy to v1.26.8 and drop buster.

Validated on mathoid and mw-debug (mathoid still on envoy-future, mw-debug back on 1.23 for now).

Aug 26 2025, 10:57 PM · Patch-For-Review, SRE, serviceops, envoy

Aug 25 2025

RLazarus closed T401737: Custom dblists for mwscript-k8s, a subtask of T341553: Allow running one-off scripts manually, as Resolved.
Aug 25 2025, 11:45 PM · MW-on-K8s, serviceops
RLazarus closed T401737: Custom dblists for mwscript-k8s as Resolved.
Aug 25 2025, 11:45 PM · MW-on-K8s, serviceops
RLazarus added a comment to T402584: Upgrade Envoy to v1.26.8 and drop buster.

We also have 237 baremetal hosts with Envoy, how shall we handle these? We could e.g. add a profile parameter $use_future to profile::envoy and then fix up the class to install envoy from component/envoy-future.

Aug 25 2025, 4:19 PM · Patch-For-Review, SRE, serviceops, envoy

Aug 22 2025

RLazarus added a comment to T394057: Create new SLO dashboard via Pyrra for Wikifunctions.

@ecarg Just a heads-up, we've broken the config out into per-team files to make it a little easier to work with, so the stanza I mentioned above has now moved to abstract_wikipedia.pp. Let us know how it's going!

Aug 22 2025, 5:52 PM · Abstract Wikipedia team (26Q1 (Jul–Sep)), SRE-SLO, Essential-Work, SRE Observability
RLazarus added a comment to T402584: Upgrade Envoy to v1.26.8 and drop buster.

For posterity -- I fatfingered the reprepro include the first time and included the _source.changes without the _amd64.changes, so for a couple hours we had a source-only package for 1.26, and I managed to publish envoy-future:1.26.8-1 (which actually contained the Envoy 1.23.10 binary) without noticing.

Aug 22 2025, 2:52 AM · Patch-For-Review, SRE, serviceops, envoy

Aug 21 2025

RLazarus created T402584: Upgrade Envoy to v1.26.8 and drop buster.
Aug 21 2025, 7:30 PM · Patch-For-Review, SRE, serviceops, envoy

Aug 20 2025

RLazarus added a comment to T394057: Create new SLO dashboard via Pyrra for Wikifunctions.

How's this looking?

Aug 20 2025, 8:48 PM · Abstract Wikipedia team (26Q1 (Jul–Sep)), SRE-SLO, Essential-Work, SRE Observability

Aug 15 2025

RLazarus added a comment to T394057: Create new SLO dashboard via Pyrra for Wikifunctions.

First SLO is up! The rolling dashboard is here, and the quarterly dashboard is here (not much to see until we've collected more data). Take an early look at the rolling dashboard, and see if the data reflects what you expect to see so far.

Aug 15 2025, 10:47 PM · Abstract Wikipedia team (26Q1 (Jul–Sep)), SRE-SLO, Essential-Work, SRE Observability
RLazarus added a comment to T401737: Custom dblists for mwscript-k8s.

This is implemented, as option #2 (--local_dblist rather than --local-dblist for consistency with other flags).

Aug 15 2025, 12:10 AM · MW-on-K8s, serviceops

Aug 13 2025

RLazarus added a project to T401803: mwscript-k8s does not include an environment variable with the username of the executing user: serviceops.
Aug 13 2025, 8:59 PM · serviceops, MW-on-K8s, MediaWiki-extensions-WikimediaMaintenance
RLazarus added a comment to T401803: mwscript-k8s does not include an environment variable with the username of the executing user.

Sorry yes, I wrote that misleadingly, but I think @akosiaris and I are both addressing the question of whether the username needs to be in the email body. No objections to sending an email notification.

Aug 13 2025, 8:57 PM · serviceops, MW-on-K8s, MediaWiki-extensions-WikimediaMaintenance
RLazarus added a comment to T394057: Create new SLO dashboard via Pyrra for Wikifunctions.
  1. Yeah, the benefit of using Istio metrics is Istio exports them for you, so you don't have to create anything. The semantics are marginally different because they're collected at the ingress level. Since you've already done the work of defining this metric the way you want it, I'm on board with using it for now and then considering a switch later.
  2. That's a good expression, the only trouble is it's a success fraction (perfect is 100%) where Pyrra's expecting an error fraction (perfect is 0%). We can adapt it either by setting the error expression to, like,
Aug 13 2025, 7:05 PM · Abstract Wikipedia team (26Q1 (Jul–Sep)), SRE-SLO, Essential-Work, SRE Observability
RLazarus added a comment to T401737: Custom dblists for mwscript-k8s.

One other note about all three ConfigMap approaches: they would insert the file into dblists/ but wouldn't update dblists-index.php. I don't think that's a problem for the mwscript use case, but if it is it might be a dealbreaker for this whole approach.

Aug 13 2025, 4:55 PM · MW-on-K8s, serviceops
RLazarus added a comment to T401737: Custom dblists for mwscript-k8s.

However, I am unsure whether ConfigMap would accept such a special file

This would work fine (the Python wrapper reads the file and creates the ConfigMap with the contents) so using <(...) is a viable approach.

Aug 13 2025, 4:48 PM · MW-on-K8s, serviceops
RLazarus added a comment to T401803: mwscript-k8s does not include an environment variable with the username of the executing user.

I agree with @akosiaris (and thanks for the archaeology). It wouldn't be hard to implement this, but I think it's the wrong approach -- especially if addWiki.php is the only script using SUDO_USER, we should update the script rather than add an anachronism to pretend we're still using sudo.

Aug 13 2025, 4:35 PM · serviceops, MW-on-K8s, MediaWiki-extensions-WikimediaMaintenance

Aug 12 2025

RLazarus created T401737: Custom dblists for mwscript-k8s.
Aug 12 2025, 6:51 PM · MW-on-K8s, serviceops
RLazarus added a comment to T394057: Create new SLO dashboard via Pyrra for Wikifunctions.

Thanks @ecarg! I should be able to help with this. A couple of questions, each of them hopefully quick:

Aug 12 2025, 3:17 AM · Abstract Wikipedia team (26Q1 (Jul–Sep)), SRE-SLO, Essential-Work, SRE Observability

Aug 6 2025

RLazarus added a comment to T400675: Page on ATS backend errors relative to traffic.

We talked about this in the SLO meeting today -- one possible approach is to keep ATSBackendErrorsHigh as a default policy, but keep a list of services to exclude because they have SLO-driven availability alerts (which are effectively the same, except with a thoughtfully chosen alert threshold). That way, over time fewer and fewer services are covered by the default alert.

Aug 6 2025, 6:43 PM · SRE-SLO, Traffic, SRE

Aug 4 2025

RLazarus closed T376776: mw-scripts SAL integration, a subtask of T341553: Allow running one-off scripts manually, as Resolved.
Aug 4 2025, 7:50 PM · MW-on-K8s, serviceops
RLazarus closed T376776: mw-scripts SAL integration as Resolved.

Implemented and documented on Wikitech.

Aug 4 2025, 7:50 PM · Sustainability (Incident Followup), MW-on-K8s, serviceops

Aug 1 2025

RLazarus added a comment to T400962: Running maintenance scripts in screen on `deploy1003` appears to fail, but is still running.

Tagging @RLazarus because we can probably catch this sort of exit and output a reassuring message and the proper command to rejoin the log stream.

Aug 1 2025, 4:53 PM · serviceops, MW-on-K8s

Jul 31 2025

RLazarus claimed T376776: mw-scripts SAL integration.
Jul 31 2025, 1:38 AM · Sustainability (Incident Followup), MW-on-K8s, serviceops

Jul 29 2025

RLazarus claimed T380211: Upgrade Envoy to >= 1.24.

Ideally we'd need to go to 1.33 or later with this work

Jul 29 2025, 7:45 PM · SRE, serviceops, envoy
RLazarus merged T341549: Update envoy to > 1.23 into T380211: Upgrade Envoy to >= 1.24.
Jul 29 2025, 7:32 PM · SRE, serviceops, envoy