Page MenuHomePhabricator

MLechvien-WMF (Matthieu Lec'hvien)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Nov 10 2025, 2:20 PM (30 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
MLechvien-WMF [ Global Accounts ]

Recent Activity

Yesterday

MLechvien-WMF moved T428133: Alert in need of triage: ProbeDown (instance sophroid:4252) from Needs Info / Blocked to Scheduled (this Q) on the ServiceOps new board.
Tue, Jun 9, 9:33 AM · ServiceOps new, sre-alert-triage
MLechvien-WMF added a comment to T428133: Alert in need of triage: ProbeDown (instance sophroid:4252).

Thanks for the analysis Reuven. @jasmine_ could you take care of fixing the incorrect probing config?

Tue, Jun 9, 9:33 AM · ServiceOps new, sre-alert-triage
MLechvien-WMF updated the task description for T419212: Upgrade ServiceOps roles from Bullseye to Debian Trixie.
Tue, Jun 9, 9:28 AM · Patch-For-Review, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware
MLechvien-WMF updated the task description for T419212: Upgrade ServiceOps roles from Bullseye to Debian Trixie.
Tue, Jun 9, 9:27 AM · Patch-For-Review, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware

Mon, Jun 8

MLechvien-WMF closed T425693: "upload at ulsfo depooled due to tcp timeout" as Resolved.
Mon, Jun 8, 3:34 PM · Incident Severity 2, Wikimedia-Incident
MLechvien-WMF added a comment to T426218: Future of RESTbase servers (potentially repurpose them for wikikube).

Thanks Eric for the clarification. As this cluster is managed by Data Persistence we should probably decline this task.

Mon, Jun 8, 1:45 PM · User-Eevans, ServiceOps-Services-Oids, Math, Mathoid, RESTBase, API Platform (RESTBase Deprecation Roadmap), ServiceOps new
MLechvien-WMF closed T422130: External store unreachable: "Database servers in clusterXX are overloaded" as Resolved.
Mon, Jun 8, 1:15 PM · Incident Severity 3, Wikimedia-Incident, SRE, DBA
MLechvien-WMF added a project to T423027: 2026-04-12 Gerrit Outage (was: DiskSpace): Incident Severity 3.
Mon, Jun 8, 12:52 PM · Incident Severity 3, Wikimedia-Incident, Gerrit, collaboration-services
MLechvien-WMF moved T423027: 2026-04-12 Gerrit Outage (was: DiskSpace) from Active investigation to Complete on the Wikimedia-Incident board.
Mon, Jun 8, 12:51 PM · Incident Severity 3, Wikimedia-Incident, Gerrit, collaboration-services
MLechvien-WMF moved T424765: webrequest_sampled not updated from Active investigation to Complete on the Wikimedia-Incident board.
Mon, Jun 8, 12:49 PM · Incident Severity 3, Wikimedia-Incident
MLechvien-WMF added a comment to T427401: Update istio to 1.29.

Will this patch {T423685} too?

Mon, Jun 8, 12:24 PM · ServiceOps new, Prod-Kubernetes, Kubernetes
MLechvien-WMF added a project to T422130: External store unreachable: "Database servers in clusterXX are overloaded": Incident Severity 3.
Mon, Jun 8, 11:52 AM · Incident Severity 3, Wikimedia-Incident, SRE, DBA
MLechvien-WMF moved T425758: Investigate rdf-streaming-updater consumer failures in eqiad from Active investigation to Complete on the Wikimedia-Incident board.
Mon, Jun 8, 11:43 AM · Incident Severity 3, Wikimedia-Incident, Data-Platform-SRE (2026-04-24 - 2026-05-15), Essential-Work, Wikidata Platform Team (Sprint 05 (2026/05/05))
MLechvien-WMF added a project to T425758: Investigate rdf-streaming-updater consumer failures in eqiad: Incident Severity 3.
Mon, Jun 8, 11:43 AM · Incident Severity 3, Wikimedia-Incident, Data-Platform-SRE (2026-04-24 - 2026-05-15), Essential-Work, Wikidata Platform Team (Sprint 05 (2026/05/05))
MLechvien-WMF moved T426381: s7 master db2218 down from Active investigation to Complete on the Wikimedia-Incident board.
Mon, Jun 8, 11:10 AM · Incident Severity 2, DBA, Wikimedia-Incident
MLechvien-WMF renamed T423714: Upgrade deploy* hosts to Debian Bookworm from Upgrade deploy* hosts to Debian Trixie to Upgrade deploy* hosts to Debian Bookworm.
Mon, Jun 8, 9:51 AM · User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware

Fri, Jun 5

MLechvien-WMF triaged T420436: Migrate Wikikube k8s apiserver and services to IPIP as Medium priority.
Fri, Jun 5, 7:33 AM · ServiceOps new (Next quarter), Prod-Kubernetes, Kubernetes, Liberica, Traffic

Thu, Jun 4

MLechvien-WMF added a comment to T428174: Standard helm chart for simple service-utils nodejs apps.

IMO this looks like a good idea. It would be great to identify couple more candidate services who may benefit from this and be early adopters.

Thu, Jun 4, 2:31 PM · ServiceOps new (Next quarter), ServiceOps-SharedInfra, Data-Engineering
MLechvien-WMF moved T428174: Standard helm chart for simple service-utils nodejs apps from Inbox to Needs Info / Blocked on the ServiceOps new board.

Thanks for filing that Andrew. Putting this on @Scott_French radar for roadmap considerations.

Thu, Jun 4, 2:26 PM · ServiceOps new (Next quarter), ServiceOps-SharedInfra, Data-Engineering
MLechvien-WMF moved T367416: Route requests to parsoid endpoints to MediaWiki directly instead of RESTbase from Inbox to Needs Info / Blocked on the ServiceOps new board.
Thu, Jun 4, 11:41 AM · ServiceOps new, RESTBase Sunsetting
MLechvien-WMF edited projects for T367416: Route requests to parsoid endpoints to MediaWiki directly instead of RESTbase, added: ServiceOps new; removed serviceops-deprecated.

@Clement_Goubert while going through the backlog of RESTBase Sunsetting I came across this task, could you see if it's worth reprioritizing it now?

Thu, Jun 4, 11:41 AM · ServiceOps new, RESTBase Sunsetting
MLechvien-WMF moved T418924: rdb201[34] implementation tracking from Needs Info / Blocked to In Progress on the ServiceOps new board.
Thu, Jun 4, 11:25 AM · ServiceOps-Upgrades-Hardware, ServiceOps new, SRE
MLechvien-WMF moved T428133: Alert in need of triage: ProbeDown (instance sophroid:4252) from Inbox to Needs Info / Blocked on the ServiceOps new board.
Thu, Jun 4, 9:14 AM · ServiceOps new, sre-alert-triage
MLechvien-WMF assigned T428133: Alert in need of triage: ProbeDown (instance sophroid:4252) to jasmine_.

@jasmine_ can you please take a look? cc @RLazarus

Thu, Jun 4, 9:14 AM · ServiceOps new, sre-alert-triage
MLechvien-WMF closed T427411: MediaWiki periodic job update-special-pages-s5 failed, a subtask of T422486: MediaWiki periodic job failures due to timeouts, as Resolved.
Thu, Jun 4, 9:12 AM · ServiceOps new (Next quarter), DBA
MLechvien-WMF closed T427411: MediaWiki periodic job update-special-pages-s5 failed as Resolved.

After checking the logs, this is the same type of transient failure that occurred in T426287: MediaWiki periodic job update-special-pages-s5 failed (thanks @Blake for looking up)

Thu, Jun 4, 9:12 AM · ServiceOps new, Wikimedia-production-error, MediaWiki-Special-pages
MLechvien-WMF moved T428078: Swap partition not used when reimaging to trixie with reuse-parts.cfg and mdadm + lvm from Inbox to In Progress on the ServiceOps new board.
Thu, Jun 4, 9:12 AM · ServiceOps new, Infrastructure-Foundations
MLechvien-WMF closed T427413: MediaWiki periodic job update-special-pages-s6 failed as Resolved.

After checking the logs, this is the same type of transient failure that occurred in T426287: MediaWiki periodic job update-special-pages-s5 failed (thanks @Blake for looking up)

Thu, Jun 4, 9:11 AM · ServiceOps new, Wikimedia-production-error, MediaWiki-Special-pages
MLechvien-WMF closed T427413: MediaWiki periodic job update-special-pages-s6 failed, a subtask of T422486: MediaWiki periodic job failures due to timeouts, as Resolved.
Thu, Jun 4, 9:11 AM · ServiceOps new (Next quarter), DBA
MLechvien-WMF assigned T428078: Swap partition not used when reimaging to trixie with reuse-parts.cfg and mdadm + lvm to JMeybohm.
Thu, Jun 4, 8:46 AM · ServiceOps new, Infrastructure-Foundations
MLechvien-WMF added a subtask for T427088: [Post kafka-main 3.7 upgrade work] Reimage brokers to trixie/JDK21 & vlan migrations on select brokers: T428078: Swap partition not used when reimaging to trixie with reuse-parts.cfg and mdadm + lvm.
Thu, Jun 4, 8:46 AM · Patch-For-Review, ServiceOps new, ServiceOps-Datastores
MLechvien-WMF added a parent task for T428078: Swap partition not used when reimaging to trixie with reuse-parts.cfg and mdadm + lvm: T427088: [Post kafka-main 3.7 upgrade work] Reimage brokers to trixie/JDK21 & vlan migrations on select brokers.
Thu, Jun 4, 8:46 AM · ServiceOps new, Infrastructure-Foundations
MLechvien-WMF added a parent task for T427413: MediaWiki periodic job update-special-pages-s6 failed: T422486: MediaWiki periodic job failures due to timeouts.
Thu, Jun 4, 8:25 AM · ServiceOps new, Wikimedia-production-error, MediaWiki-Special-pages
MLechvien-WMF added a subtask for T422486: MediaWiki periodic job failures due to timeouts: T427413: MediaWiki periodic job update-special-pages-s6 failed.
Thu, Jun 4, 8:25 AM · ServiceOps new (Next quarter), DBA
MLechvien-WMF added a parent task for T427411: MediaWiki periodic job update-special-pages-s5 failed: T422486: MediaWiki periodic job failures due to timeouts.
Thu, Jun 4, 8:24 AM · ServiceOps new, Wikimedia-production-error, MediaWiki-Special-pages
MLechvien-WMF added a subtask for T422486: MediaWiki periodic job failures due to timeouts: T427411: MediaWiki periodic job update-special-pages-s5 failed.
Thu, Jun 4, 8:24 AM · ServiceOps new (Next quarter), DBA
MLechvien-WMF triaged T427929: Host assetlinks.json on root wikimedia.org domain as Medium priority.

Thanks for confirming.

Thu, Jun 4, 8:15 AM · ServiceOps new (Next quarter), Wikipedia-Android-App-Backlog, ServiceOps-Mediawiki

Wed, Jun 3

MLechvien-WMF moved T427929: Host assetlinks.json on root wikimedia.org domain from Inbox to Needs Info / Blocked on the ServiceOps new board.

Hi @Dbrant , can I confirm what this is blocking and if there's any desired timeline for having this?

Wed, Jun 3, 9:06 AM · ServiceOps new (Next quarter), Wikipedia-Android-App-Backlog, ServiceOps-Mediawiki
MLechvien-WMF triaged T427999: Redis solution for LockManager as Medium priority.

Thanks for the detailed analysis @jijiki .
Moving to next quarter for discussion as we're oversubscribed this quarter. @Ladsgroup FYI

Wed, Jun 3, 7:53 AM · ServiceOps new (Next quarter)
MLechvien-WMF moved T428020: codfw: rack A5 maintenance from Inbox to Scheduled (this Q) on the ServiceOps new board.
Wed, Jun 3, 7:38 AM · Infrastructure-Foundations, netops, ServiceOps new

Tue, Jun 2

MLechvien-WMF assigned T427899: Build httpbb for Trixie to RLazarus.
Tue, Jun 2, 3:10 PM · ServiceOps new, SRE
MLechvien-WMF edited projects for T290357: Maintenance environment needed for running one-off commands, added: ServiceOps new (Next quarter); removed ServiceOps new.
Tue, Jun 2, 2:51 PM · ServiceOps new (Next quarter), ServiceOps-SharedInfra, Kubernetes, Toolhub

Mon, Jun 1

MLechvien-WMF added a comment to T419212: Upgrade ServiceOps roles from Bullseye to Debian Trixie.

Chatting with @Raine we'll upgrade the deploy role to Bookworm (tracked in T418262: deploy2003 implementation tracking) this quarter. Upgrade to trixie is not high priority for this quarter

Mon, Jun 1, 9:45 AM · Patch-For-Review, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware
MLechvien-WMF lowered the priority of T425545: Investigate Code 414 error when selecting zh-classical (lzh) language from article toolbar from Medium to Low.
Mon, Jun 1, 9:29 AM · User-Raine, Wikipedia-Android-App-Backlog, ServiceOps new, SRE, Content-Transform-Team

Fri, May 29

MLechvien-WMF moved T418200: Migrate Service Ops Docker images running in production away from Bullseye from Scheduled (this Q) to In Progress on the ServiceOps new board.
Fri, May 29, 11:26 AM · ServiceOps new, ServiceOps-Upgrades-Hardware, ServiceOps-Mediawiki
MLechvien-WMF moved T277677: Write a cookbook to set a k8s cluster in maintenance mode from Inbox to Backlog on the ServiceOps new board.
Fri, May 29, 9:51 AM · ServiceOps-good-first-task, ServiceOps new, SRE-Sprint-Week-Sustainability-March2023, Infrastructure-Foundations, SRE-tools, Prod-Kubernetes
MLechvien-WMF edited projects for T277677: Write a cookbook to set a k8s cluster in maintenance mode, added: ServiceOps new, ServiceOps-good-first-task; removed Sustainability (Incident Followup), serviceops-deprecated.
Fri, May 29, 9:51 AM · ServiceOps-good-first-task, ServiceOps new, SRE-Sprint-Week-Sustainability-March2023, Infrastructure-Foundations, SRE-tools, Prod-Kubernetes

Thu, May 28

MLechvien-WMF placed T418918: rdb101[56] implementation tracking up for grabs.

@jijiki are you handling that task too as part of T419976: Upgrade redis_misc hosts to Debian Trixie (Redis 8.0) ?

Thu, May 28, 2:08 PM · Patch-For-Review, ServiceOps new (Next quarter), ServiceOps-Upgrades-Hardware, SRE
MLechvien-WMF assigned T427307: deployment-charts CI does not fail on `Template did not render correctly` to Raine.
Thu, May 28, 8:41 AM · ServiceOps-SharedInfra, ServiceOps new

Wed, May 27

MLechvien-WMF moved T427110: Decommission wikikube-ctrl200[1-2] from Inbox to Scheduled (this Q) on the ServiceOps new board.
Wed, May 27, 5:18 PM · ServiceOps-Upgrades-Hardware, ServiceOps new, decommission-hardware
MLechvien-WMF assigned T279146: Remove mediawiki Request loops from production to jijiki.

Per IRC chat, assigning to Effie to assess if this is still present, how it manifests and explain why this is an issue.

Wed, May 27, 8:59 AM · ServiceOps-Mediawiki, ServiceOps new, Platform Engineering, User-jijiki
MLechvien-WMF moved T279146: Remove mediawiki Request loops from production from Inbox to Radar (Pending) on the ServiceOps new board.
Wed, May 27, 8:52 AM · ServiceOps-Mediawiki, ServiceOps new, Platform Engineering, User-jijiki
MLechvien-WMF edited projects for T279146: Remove mediawiki Request loops from production, added: ServiceOps new, ServiceOps-Mediawiki; removed SRE, serviceops-deprecated.
Wed, May 27, 8:52 AM · ServiceOps-Mediawiki, ServiceOps new, Platform Engineering, User-jijiki
MLechvien-WMF triaged T427110: Decommission wikikube-ctrl200[1-2] as Medium priority.
Wed, May 27, 7:56 AM · ServiceOps-Upgrades-Hardware, ServiceOps new, decommission-hardware
MLechvien-WMF renamed T426222: decommission deploy2002.codfw.wmnet from decomission deploy2002.codfw.wmnet to decommission deploy2002.codfw.wmnet.
Wed, May 27, 7:56 AM · Patch-For-Review, ServiceOps new, SRE, ops-codfw, DC-Ops, procurement
MLechvien-WMF triaged T426222: decommission deploy2002.codfw.wmnet as Medium priority.
Wed, May 27, 7:56 AM · Patch-For-Review, ServiceOps new, SRE, ops-codfw, DC-Ops, procurement

Tue, May 26

MLechvien-WMF assigned T418200: Migrate Service Ops Docker images running in production away from Bullseye to Raine.

As discussed over chat, Raine will take this task

Tue, May 26, 8:57 AM · ServiceOps new, ServiceOps-Upgrades-Hardware, ServiceOps-Mediawiki

Fri, May 22

MLechvien-WMF updated subscribers of T418175: Create SLO for the opensearch-ipoid cluster that runs on our OpenSearch on K8s platform.

Hi, can I confirm that https://wikitech.wikimedia.org/wiki/SLO/ipoid should now redirect to https://wikitech.wikimedia.org/wiki/SLO/OpenSearch_IPoid ?

Fri, May 22, 1:10 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15)
MLechvien-WMF changed the status of T418918: rdb101[56] implementation tracking, a subtask of T418916: Q3:rack/setup/install rdb101[56], from Stalled to Open.
Fri, May 22, 10:34 AM · Patch-For-Review, ServiceOps-Upgrades-Hardware, SRE, ServiceOps new, ops-eqiad, DC-Ops
MLechvien-WMF changed the status of T418918: rdb101[56] implementation tracking from Stalled to Open.

This is unstalled now and ready to be picked up

Fri, May 22, 10:34 AM · Patch-For-Review, ServiceOps new (Next quarter), ServiceOps-Upgrades-Hardware, SRE
MLechvien-WMF assigned T424942: wikikube-worker13[75-84] implementation tracking to JMeybohm.
Fri, May 22, 10:28 AM · ServiceOps new, ServiceOps-Upgrades-Hardware, DC-Ops
MLechvien-WMF added a project to T425758: Investigate rdf-streaming-updater consumer failures in eqiad: Wikimedia-Incident.
Fri, May 22, 9:44 AM · Incident Severity 3, Wikimedia-Incident, Data-Platform-SRE (2026-04-24 - 2026-05-15), Essential-Work, Wikidata Platform Team (Sprint 05 (2026/05/05))
MLechvien-WMF added a comment to T390861: wikikube-ctrl200[4-5] implementation tracking.

@jasmine_ can we close this? or please capture what remains to do

Fri, May 22, 8:25 AM · ServiceOps-Upgrades-Hardware, ServiceOps new
MLechvien-WMF added a comment to T427024: Move MediaWiki envoy drain configuration from mesh.extra_env to mesh.admin.

@Scott_French Are we confident this will get done this quarter? We're past mid-Q4 so we could consider putting it in Next quarter milestone instead

Fri, May 22, 8:03 AM · ServiceOps new (Next quarter), MW-on-K8s, ServiceOps-Mediawiki

Thu, May 21

MLechvien-WMF reassigned T414439: Deploy a ratelimit service for upload from Clement_Goubert to JMeybohm.
Thu, May 21, 11:49 AM · ServiceOps-Services-Oids, ServiceOps new
MLechvien-WMF reassigned T414440: Configure ms-fe envoys to use ratelimit service from Clement_Goubert to JMeybohm.
Thu, May 21, 11:48 AM · ServiceOps-Services-Oids, ServiceOps new
MLechvien-WMF moved T425693: "upload at ulsfo depooled due to tcp timeout" from Active investigation to Complete on the Wikimedia-Incident board.

Moving to complete as the follow up action items have been filed.

Thu, May 21, 11:09 AM · Incident Severity 2, Wikimedia-Incident
MLechvien-WMF moved T422130: External store unreachable: "Database servers in clusterXX are overloaded" from Active investigation to Resolved on the Wikimedia-Incident board.
Thu, May 21, 11:00 AM · Incident Severity 3, Wikimedia-Incident, SRE, DBA
MLechvien-WMF moved T420436: Migrate Wikikube k8s apiserver and services to IPIP from Backlog to Next quarter on the ServiceOps new board.
Thu, May 21, 10:51 AM · ServiceOps new (Next quarter), Prod-Kubernetes, Kubernetes, Liberica, Traffic
MLechvien-WMF removed a project from T390573: Consider removing envvars.inc from MediaWiki images: Sustainability (Incident Followup).

Removing the Incident Followup tag as there's no remaining risk to production per https://phabricator.wikimedia.org/T390573#11933994

Thu, May 21, 8:28 AM · ServiceOps new, ServiceOps-Mediawiki

Wed, May 20

MLechvien-WMF assigned T418920: wikikube-ctrl100[56] implementation tracking to jasmine_.
Wed, May 20, 1:00 PM · Patch-For-Review, ServiceOps-Upgrades-Hardware, ServiceOps new, SRE
MLechvien-WMF added a comment to T418916: Q3:rack/setup/install rdb101[56].

@Jclark-ctr are this and T418922: Q3:rack/setup/install rdb201[34] (which had the same issue IIRC) unblocked now?

Wed, May 20, 8:59 AM · Patch-For-Review, ServiceOps-Upgrades-Hardware, SRE, ServiceOps new, ops-eqiad, DC-Ops

Tue, May 19

MLechvien-WMF added a project to T426226: etcd documentation needs updating: ServiceOps-SharedInfra.
Tue, May 19, 4:54 PM · ServiceOps-SharedInfra, Documentation, ServiceOps new

Wed, May 13

MLechvien-WMF moved T416576: Make mw-cron jobs alert thresholds easily configurable from In Progress to Next quarter on the ServiceOps new board.
Wed, May 13, 5:29 PM · ServiceOps new (Next quarter)
MLechvien-WMF updated subscribers of T414444: Define upload ratelimit classes and initial policies.

@Clement_Goubert From my discussion with @JTweed-WMF , he offered to help on this workstream and will touchbase with you

Wed, May 13, 3:54 PM · ServiceOps-Services-Oids, ServiceOps new
MLechvien-WMF moved T420223: High (relatively) number of memcached errors in eqiad from Needs Info / Blocked to Backlog on the ServiceOps new board.
Wed, May 13, 3:48 PM · Infrastructure-Foundations, ServiceOps new, ServiceOps-Datastores
MLechvien-WMF added a comment to T371069: Add helm rollback functionality to scap.

@RLazarus given Scott's comment and yours, it sounds like this should move to Backlog (though the discussion should continue on what needs to be done). Could you do that and also assess the priority of this?

Wed, May 13, 9:46 AM · ServiceOps new, Release-Engineering-Team (Priority Backlog 📥), MW-on-K8s, Scap
MLechvien-WMF added a comment to T418212: Automate the creation of implementation task from rack/setup/install tasks for Serviceops.

Taking a step back and expanding a bit the scope, a procurement task for a refresh should always trigger a chain of tasks being created:

Wed, May 13, 9:20 AM · ServiceOps-Upgrades-Hardware, serviceops-tooling, ServiceOps new, DC-Ops
MLechvien-WMF assigned T425545: Investigate Code 414 error when selecting zh-classical (lzh) language from article toolbar to Raine.

Thanks for surfacing that.

Wed, May 13, 9:17 AM · User-Raine, Wikipedia-Android-App-Backlog, ServiceOps new, SRE, Content-Transform-Team
MLechvien-WMF moved T424824: api-gateway: run make test in CI from Inbox to Needs Info / Blocked on the ServiceOps new board.
Wed, May 13, 8:33 AM · User-Raine, Release-Engineering-Team, MediaWiki-Platform-Team (Kanban Board), ServiceOps new, Test-Coverage, OKR-Work, FY2025-26 KR 5.1

Mon, May 11

MLechvien-WMF moved T423619: Should we skip some directories from deploy backups? from Needs Info / Blocked to In Progress on the ServiceOps new board.
Mon, May 11, 6:54 PM · User-Raine, ServiceOps-SharedInfra, ServiceOps new, DC-Ops
MLechvien-WMF updated subscribers of T423619: Should we skip some directories from deploy backups?.

we discussed this today and @Raine should be able to assist on this test

Mon, May 11, 6:54 PM · User-Raine, ServiceOps-SharedInfra, ServiceOps new, DC-Ops
MLechvien-WMF moved T424872: Evaluate shared Chromium service (WebSocket) for 3d2png and chromium-render from Inbox to Radar (Awareness) on the ServiceOps new board.
Mon, May 11, 4:30 PM · ServiceOps new, Community-Tech (Sea Lion Squad), 3D, Release-Engineering-Team
MLechvien-WMF edited projects for T422967: Investigate DNS query improvements in MediaWiki-on-k8s, added: ServiceOps new (Next quarter); removed ServiceOps new.
Mon, May 11, 3:36 PM · ServiceOps new (Next quarter), ServiceOps-SharedInfra
MLechvien-WMF moved T422804: Reroute LiftWing endpoints from Needs Info / Blocked to In Progress on the ServiceOps new board.
Mon, May 11, 2:57 PM · Lift-Wing, Machine-Learning-Team, ServiceOps-SharedInfra, ServiceOps new, Epic, OKR-Work, [MWI] FY2025-26 Q3, MW-Interfaces-Team (MWI-Roadmap)

May 7 2026

MLechvien-WMF moved T419212: Upgrade ServiceOps roles from Bullseye to Debian Trixie from Scheduled (this Q) to In Progress on the ServiceOps new board.
May 7 2026, 2:23 PM · Patch-For-Review, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware
MLechvien-WMF triaged T424942: wikikube-worker13[75-84] implementation tracking as Medium priority.
May 7 2026, 2:22 PM · ServiceOps new, ServiceOps-Upgrades-Hardware, DC-Ops
MLechvien-WMF added a comment to T418262: deploy2003 implementation tracking.

@Raine CMIIW but the successful ICU upgrade means we don't have the PHP blocker anymore, so we can go straight to Trixie right?

May 7 2026, 2:17 PM · Patch-For-Review, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware
MLechvien-WMF moved T419976: Upgrade redis_misc hosts to Debian Trixie (Redis 8.0) from Scheduled (this Q) to In Progress on the ServiceOps new board.
May 7 2026, 2:13 PM · Patch-For-Review, Infrastructure-Foundations, ServiceOps new, MediaWiki-Platform-Team (Radar), ServiceOps-Datastores, MW-Interfaces-Team
MLechvien-WMF moved T419216: Upgrade kafka-main to Kafka 3.7 from Scheduled (this Q) to In Progress on the ServiceOps new board.
May 7 2026, 2:12 PM · Patch-For-Review, ServiceOps new, ServiceOps-Datastores
MLechvien-WMF assigned T422955: Detect elevated rates of EtcdConfig fetch failures to Scott_French.

Per Slack discussion, assigning to Scott as he has the right context, and moving this to next quarter as we won't have the capacity in current one.

May 7 2026, 1:54 PM · ServiceOps new (Next quarter), ServiceOps-SharedInfra
MLechvien-WMF added a comment to T416623: Decommission NodeJS IPoid service.

Capturing the state after chatting with Ollie, tentative timeline is end of June. Thanks!

May 7 2026, 8:32 AM · Essential-Work, Product Safety and Integrity, ServiceOps-Services-Oids, ServiceOps new, iPoid-Service (IPoid OpenSearch)
MLechvien-WMF moved T425255: Upgrade mcrouter to v2026.04.27.00 and switch build system from In Progress to Next quarter on the ServiceOps new board.
May 7 2026, 8:22 AM · ServiceOps new (Next quarter), Patch-For-Review, ServiceOps-Datastores, Wikimedia-Hackathon-2026
MLechvien-WMF moved T385007: Extend functionality to support MediaWiki infrastructure Windows and related repos from In Progress to Backlog on the ServiceOps new board.
May 7 2026, 8:22 AM · User-jijiki, Release-Engineering-Team, ServiceOps new, Patch-For-Review, Wikimedia-Hackathon-2026, Tool-schedule-deployment

May 6 2026

MLechvien-WMF added a comment to T249663: write some recording rules for queries used in the appserver RED k8s dashboard.

@hnowlan can I confirm if we are targeting to complete this this quarter?

May 6 2026, 9:01 AM · Observability-Metrics, SRE Observability (FY2025/2026-Q3), Prod-Kubernetes, ServiceOps new, SRE
MLechvien-WMF added a comment to T421483: parsoidtest1001.eqiad.wmnet retirement plan.

Checking in on this, do we have an ETA for decommissioning this service?

May 6 2026, 8:57 AM · Essential-Work, Content-Transform-Team (Work In Progress), ServiceOps new, serviceops-radar

May 5 2026

MLechvien-WMF removed a project from T425379: Tune kafka_server_BrokerTopicMetrics_BytesOut_total: Wikimedia-Incident.
May 5 2026, 1:57 PM · Data-Platform-SRE (2026-06-05 - 2026-06-26), Sustainability (Incident Followup)
MLechvien-WMF removed a project from T425380: More comprehensive end-to-end monitoring for webrequest data: Wikimedia-Incident.
May 5 2026, 1:57 PM · Data-Platform-SRE, Sustainability (Incident Followup)
MLechvien-WMF removed a project from T425381: Clean up /etc/kafka/admin.properties: Wikimedia-Incident.
May 5 2026, 1:56 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Sustainability (Incident Followup)
MLechvien-WMF closed T424765: webrequest_sampled not updated as Resolved.

Action items got filed so closing this

May 5 2026, 1:55 PM · Incident Severity 3, Wikimedia-Incident