Page MenuHomePhabricator
Feed Search

Fri, Jun 5

atsuko updated the task description for T427517: Issue 6 month certificates for OpenSearch-on-K8S.
Fri, Jun 5, 5:18 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko added a comment to T427517: Issue 6 month certificates for OpenSearch-on-K8S.

Took some research. Need to apply admin_ng/dse-k8s: create opensearch ClusterIssuer and edit private hieradata/role/common/deployment_server/kubernetes.yaml.

Fri, Jun 5, 5:16 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko reassigned T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s from atsuko to bd808.
Fri, Jun 5, 12:36 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
atsuko added a comment to T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.

@bd808 you can bypass SNI problems and terminate SSL with socat

socat TCP4-LISTEN:8000,reuseaddr,fork OPENSSL-CONNECT:127.0.0.130443,snihost=opensearch-toolhub-test.svc.eqiad.wmnet,verify=0
Fri, Jun 5, 9:32 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub

Thu, Jun 4

atsuko claimed T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.

Tested proposed patch, it seems to unblock us.

Thu, Jun 4, 4:24 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization
atsuko added a comment to T428168: Make Translate compatible with OpenSearch 2.

I run the patch in MediaWiki-Docker and it seems that it created index, P93883.

Thu, Jun 4, 4:18 PM · MW-1.47-notes (1.47.0-wmf.6; 2026-06-09), LPL Essential (FY2025-26 Q3&4), LPL Projects (Other), MediaWiki-extensions-Translate
atsuko created P93883 Dump of empty index from I0e51095fc8b967824f472337a1c0fdea36fb214f.
Thu, Jun 4, 4:18 PM
atsuko added a comment to T428168: Make Translate compatible with OpenSearch 2.

For testing, you can try accessing production k8s instance:

ssh -L 30443:opensearch-toolhub.svc.eqiad.wmnet:30443 deployment.eqiad.wmnet -N

Then adding 127.0.0.1 opensearch-toolhub.svc.eqiad.wmnet to /etc/hosts.

Thu, Jun 4, 1:51 PM · MW-1.47-notes (1.47.0-wmf.6; 2026-06-09), LPL Essential (FY2025-26 Q3&4), LPL Projects (Other), MediaWiki-extensions-Translate
atsuko changed the status of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s from In Progress to Stalled.
Thu, Jun 4, 11:26 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization
atsuko changed the status of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s, a subtask of T424248: ☂️Migrate non-cirrus indices from production OpenSearch to OpenSearch on k8s ☂️, from In Progress to Stalled.
Thu, Jun 4, 11:26 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko reassigned T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s from atsuko to Nikerabbit.

Assigning to @Nikerabbit for further guidance on how we can proceed

Thu, Jun 4, 11:21 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization

Tue, Jun 2

atsuko added a comment to T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.

Configured prod instance, per-DC URLs are
https://opensearch-toolhub.svc.eqiad.wmnet:30443/
https://opensearch-toolhub.svc.codfw.wmnet:30443/

Tue, Jun 2, 3:51 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
atsuko added a comment to T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.

Indices are accessible without pasword. Before:

atsuko@deploy1003:~$ curl -X PUT https://opensearch-ttmserver-test.svc.eqiad.wmnet:30443/ttmserver/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'; echo
{"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [indices:admin/settings/update] and User [name=opendistro_security_anonymous, backend_roles=[opendistro_security_anonymous_backendrole], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [indices:admin/settings/update] and User [name=opendistro_security_anonymous, backend_roles=[opendistro_security_anonymous_backendrole], requestedTenant=null]"},"status":403}

After:

atsuko@deploy1003:~$ curl -X PUT https://opensearch-ttmserver-test.svc.codfw.wmnet:30443/ttmserver/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'; echo
{"acknowledged":true}
atsuko@deploy1003:~$ curl -X PUT https://opensearch-ttmserver-test.svc.eqiad.wmnet:30443/ttmserver/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'; echo
{"acknowledged":true}
Tue, Jun 2, 1:28 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization
atsuko added a comment to T427898: Invite to statuspage for atsuko@wikimedia.org.

Accepted the invite, thanks!

Tue, Jun 2, 11:41 AM · observability
atsuko added a project to T427839: 502/503 for mediawiki.page_change.v1 stream: Incident Severity 3.
Tue, Jun 2, 10:04 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Incident Severity 3, Wikimedia-Incident, Data-Engineering, EventStreams
atsuko moved T427839: 502/503 for mediawiki.page_change.v1 stream from Active investigation to Resolved on the Wikimedia-Incident board.
Tue, Jun 2, 9:44 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Incident Severity 3, Wikimedia-Incident, Data-Engineering, EventStreams
atsuko added a project to T427839: 502/503 for mediawiki.page_change.v1 stream: Wikimedia-Incident.
Tue, Jun 2, 9:43 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Incident Severity 3, Wikimedia-Incident, Data-Engineering, EventStreams
atsuko created T427898: Invite to statuspage for atsuko@wikimedia.org.
Tue, Jun 2, 9:40 AM · observability
atsuko added a comment to T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.

@dcausse and me didn't proceed with roll-out after running checks on testwiki: even tho credentials were in the config, we were still getting unauthenticated access error from the client

[eqiad-test] => Array
    (
        [type] => ttmserver
        [class] => ElasticSearchTTMServer
        [shards] => 1
        [index] => ttmserver-test
        [cutoff] => 0.65
        [writable] => 1
        [use_wikimedia_extra] => 1
        [config] => Array
            (
                [servers] => Array
                    (
                        [0] => Array
                            (
                                [host] => opensearch-ttmserver-test.svc.eqiad.wmnet
                                [transport] => CirrusSearch\Elastica\DeprecationLoggedHttps
                                [port] => 30443
                                [username] => opensearch
                                [password] => <REDACTED>
                            )
Tue, Jun 2, 8:23 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization

Mon, Jun 1

atsuko closed T427839: 502/503 for mediawiki.page_change.v1 stream, a subtask of T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel, as Resolved.
Mon, Jun 1, 9:49 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko closed T427839: 502/503 for mediawiki.page_change.v1 stream as Resolved.
Mon, Jun 1, 9:49 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Incident Severity 3, Wikimedia-Incident, Data-Engineering, EventStreams
atsuko changed the status of T427839: 502/503 for mediawiki.page_change.v1 stream from Open to In Progress.

Potential start time https://grafana.wikimedia.org/goto/cfnux6el9oruob?orgId=1 correlates with the release of new chart. I'm rolling back the timeout "increase" from "0s" (infinite) to "60s", as it seems that after canarying the change, the cluster is stable.

Mon, Jun 1, 9:23 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Incident Severity 3, Wikimedia-Incident, Data-Engineering, EventStreams
atsuko changed the status of T427839: 502/503 for mediawiki.page_change.v1 stream, a subtask of T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel, from Open to In Progress.
Mon, Jun 1, 9:23 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko added a parent task for T427839: 502/503 for mediawiki.page_change.v1 stream: T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.
Mon, Jun 1, 9:15 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Incident Severity 3, Wikimedia-Incident, Data-Engineering, EventStreams
atsuko added a subtask for T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel: T427839: 502/503 for mediawiki.page_change.v1 stream.
Mon, Jun 1, 9:15 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko created P93467 (An Untitled Masterwork).
Mon, Jun 1, 8:39 PM
atsuko closed T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel as Resolved.
Mon, Jun 1, 3:49 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko closed T427774: Flink image fails to build in weekly rebuild, a subtask of T412978: Support for Java 21 and Flink 2, as Resolved.
Mon, Jun 1, 10:54 AM · Data-Platform-SRE (2026-06-05 - 2026-06-26), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review
atsuko closed T427774: Flink image fails to build in weekly rebuild as Resolved.
Mon, Jun 1, 10:53 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15)
atsuko added a comment to T427774: Flink image fails to build in weekly rebuild.
root@build2001:/srv/images/production-images# /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/production-images/config.yaml build images/ --select '*flink:2*' 
* docker-registry.discovery.wmnet/flink:2.0.2-wmf1-20260531
Mon, Jun 1, 9:46 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15)
atsuko closed T427197: Requesting Access to Analytics Data Lake for Dlulisa-WMF as Invalid.

Confirmed that the access is already present, no change needed.

Mon, Jun 1, 9:31 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko added a subtask for T412978: Support for Java 21 and Flink 2: T427774: Flink image fails to build in weekly rebuild.
Mon, Jun 1, 9:22 AM · Data-Platform-SRE (2026-06-05 - 2026-06-26), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review
atsuko added a parent task for T427774: Flink image fails to build in weekly rebuild: T412978: Support for Java 21 and Flink 2.
Mon, Jun 1, 9:22 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15)
atsuko changed the status of T427774: Flink image fails to build in weekly rebuild from Open to In Progress.

It seems that we forgot to update the control to openjdk-21, making a patch now.

Mon, Jun 1, 9:20 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15)
atsuko added a comment to T427197: Requesting Access to Analytics Data Lake for Dlulisa-WMF.

Could you please re-check that you have the access to the tables if you do kinit wmf-ldlulisa.

Mon, Jun 1, 9:16 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko closed T427279: Requesting access to Analytics Data Lake for kevmon/kmontalva-wmf as Resolved.

Needed to create kerberos principal that matches the unixname, kmontalva-wmf. Created, coordinated with @KMontalva-WMF

Mon, Jun 1, 9:11 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests

Fri, May 29

atsuko changed the status of T427197: Requesting Access to Analytics Data Lake for Dlulisa-WMF from In Progress to Open.

There is already existing account wmf-ldlulisa created in T421214, with this level of privileges, but with different SSH key, Could you please clarify if you need to just update the SSH key.

Fri, May 29, 2:40 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko changed the status of T427279: Requesting access to Analytics Data Lake for kevmon/kmontalva-wmf from In Progress to Open.
Fri, May 29, 2:33 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko added a comment to T427279: Requesting access to Analytics Data Lake for kevmon/kmontalva-wmf.

There is already existing account kmontalva-wmf, but with different SSH key, with this level of privileges.
Could you please clarify if you need to just update the SSH key.

Fri, May 29, 2:29 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko changed the status of T427197: Requesting Access to Analytics Data Lake for Dlulisa-WMF from Open to In Progress.
Fri, May 29, 1:57 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko changed the status of T427279: Requesting access to Analytics Data Lake for kevmon/kmontalva-wmf from Open to In Progress.
Fri, May 29, 1:57 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), SRE, SRE-Access-Requests
atsuko closed T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org, a subtask of T424820: Unexplainable .* behavior in intitle regex search, as Resolved.
Fri, May 29, 1:15 PM · Discovery-Search (2026.06.01 - 2026.07.03), CirrusSearch
atsuko closed T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org as Resolved.
Fri, May 29, 1:15 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Discovery-Search (2026.05.04 - 2026.05.29), CirrusSearch
atsuko moved T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org from Quick Wins to Done on the Data-Platform-SRE (2026-04-24 - 2026-05-15) board.
Fri, May 29, 1:14 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Discovery-Search (2026.05.04 - 2026.05.29), CirrusSearch
atsuko added a comment to T424820: Unexplainable .* behavior in intitle regex search.

Do we need to update ttmserver as well?

Fri, May 29, 1:14 PM · Discovery-Search (2026.06.01 - 2026.07.03), CirrusSearch
atsuko added a comment to T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org.
atsuko@apt1002:~$ sudo -i reprepro -C component/opensearch2 include trixie-wikimedia $(realpath wmf-opensearch-search-plugins_2.19.5+6_amd64.changes )
Exporting indices...
Deleting files no longer referenced...
atsuko@apt1002:~$ sudo -i reprepro -C component/opensearch2 list trixie-wikimedia 
...
trixie-wikimedia|component/opensearch2|amd64: wmf-opensearch-search-plugins 2.19.5+6~trixie
trixie-wikimedia|component/opensearch2|i386: wmf-opensearch-search-plugins 2.19.5+6~trixie
...
Fri, May 29, 1:13 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Discovery-Search (2026.05.04 - 2026.05.29), CirrusSearch
atsuko changed the status of T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org, a subtask of T424820: Unexplainable .* behavior in intitle regex search, from Open to In Progress.
Fri, May 29, 12:22 PM · Discovery-Search (2026.06.01 - 2026.07.03), CirrusSearch
atsuko changed the status of T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org from Open to In Progress.
Fri, May 29, 12:22 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Discovery-Search (2026.05.04 - 2026.05.29), CirrusSearch
atsuko changed the status of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s from Stalled to In Progress.

wikimedia-config backport is scheduled for Monday. If everything is going well, will start a new cluster and switchover production as well.

Fri, May 29, 12:21 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization
atsuko changed the status of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s, a subtask of T424248: ☂️Migrate non-cirrus indices from production OpenSearch to OpenSearch on k8s ☂️, from Stalled to In Progress.
Fri, May 29, 12:21 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Slience id 7d0bff61-73e8-4e1f-b324-c107b5b54adc

Fri, May 29, 9:58 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko claimed T427304: Import wmf-opensearch-search-plugins_2.19.5+6.trixie.deb from gitlab build to apt.wikimedia.org.
Fri, May 29, 9:35 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Discovery-Search (2026.05.04 - 2026.05.29), CirrusSearch

Thu, May 28

atsuko updated the task description for T427517: Issue 6 month certificates for OpenSearch-on-K8S.
Thu, May 28, 2:16 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko updated the task description for T427517: Issue 6 month certificates for OpenSearch-on-K8S.
Thu, May 28, 2:12 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko added a subtask for T421757: ☂️ Migrate production OpenSearch clusters from 1.x-2.x ☂️: T427517: Issue 6 month certificates for OpenSearch-on-K8S.
Thu, May 28, 2:04 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko added a parent task for T427517: Issue 6 month certificates for OpenSearch-on-K8S: T421757: ☂️ Migrate production OpenSearch clusters from 1.x-2.x ☂️.
Thu, May 28, 2:04 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko added a project to T427517: Issue 6 month certificates for OpenSearch-on-K8S: Data-Platform-SRE.
Thu, May 28, 2:03 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko claimed T427517: Issue 6 month certificates for OpenSearch-on-K8S.
Thu, May 28, 2:02 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko created T427517: Issue 6 month certificates for OpenSearch-on-K8S.
Thu, May 28, 2:00 PM · Patch-For-Review, Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko added a comment to T427419: Unable to finish 2FA.

Confirming it works for me, thanks!

Thu, May 28, 8:19 AM · Regression, Wikimedia-production-error, Product Safety and Integrity, MediaWiki-extensions-OATHAuth

Wed, May 27

atsuko updated the task description for T427419: Unable to finish 2FA.
Wed, May 27, 6:08 PM · Regression, Wikimedia-production-error, Product Safety and Integrity, MediaWiki-extensions-OATHAuth
atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Cleanup:
0. Check monitorings

  1. Remove eventstream-internal from main k8s
  2. Remove extra configs from turnilo and eventstream
  3. Downgrade eventstreams-internal.discovery.wmnet from lvs to ingress lb
Wed, May 27, 5:41 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko moved T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel from In Progress to Done on the Data-Platform-SRE (2026-04-24 - 2026-05-15) board.
Wed, May 27, 5:16 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko added a comment to T427419: Unable to finish 2FA.

I stepped from previous point to OATHAuth/modules/webauthn/util/Authenticator.js:39 and got this in console

Wed, May 27, 4:43 PM · Regression, Wikimedia-production-error, Product Safety and Integrity, MediaWiki-extensions-OATHAuth
atsuko created T427419: Unable to finish 2FA.
Wed, May 27, 4:11 PM · Regression, Wikimedia-production-error, Product Safety and Integrity, MediaWiki-extensions-OATHAuth
atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

dse-k8s service applied and working. Need to merge DNS and dyna configurations and it should be done.

Wed, May 27, 3:40 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko created P93249 Ic79f742af578a391e100c9ecbfef467369c33e1d diff.
Wed, May 27, 12:27 PM

Tue, May 26

atsuko added a comment to T412978: Support for Java 21 and Flink 2.

Notes about apache-flink==2.0.2 bot working on python3.13 on trixie.

Tue, May 26, 3:43 PM · Data-Platform-SRE (2026-06-05 - 2026-06-26), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review
atsuko created P93019 grpcio-tools==1.53.0 on python3.13/trixie.
Tue, May 26, 2:12 PM
atsuko created P93001 pip install apache-flink==2.0.2 on trixie.
Tue, May 26, 12:59 PM
atsuko created P92994 (An Untitled Masterwork).
Tue, May 26, 12:25 PM

Wed, May 20

atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Got service working on dpe-k8s-eqiad, need to do:

  1. Register service with idp (Application Not Authorized to Use CAS)
  2. Finish diffs massaging and review, so far I
    • edited some of the vendor templates
    • didn't backport debug functionality yet
Wed, May 20, 5:45 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko updated the language for P92687 Testing the changes for eventstream-internal from autodetect to diff.
Wed, May 20, 4:45 PM
atsuko created P92687 Testing the changes for eventstream-internal.
Wed, May 20, 1:35 PM

Tue, May 19

atsuko created P92612 mediawiki/php-1.47.0-wmf.3/extensions/Translate: getReplicaCount(): Return value must be of type string, int returned.
Tue, May 19, 3:59 PM
atsuko added a comment to T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.
  1. Version check in ttmserver-export.php is released r/1286978.
  2. Need to fix MediaWiki\Extension\Translate\TtmServer\ElasticSearchTtmServer::getReplicaCount to work with current config wmf-config/CommonSettings.php where we use integer but it expects a string: Return value must be of type string, int returned.
  3. Need a release plan for switchover on production cluster, since adding the server without indices breaks the read operations, see T426467.
Tue, May 19, 3:55 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization

Mon, May 18

atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Created the namespaces in dse-k8s-eqiad, unblocked testing of the new chart/configuration.

Mon, May 18, 1:54 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko reassigned T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s from atsuko to bd808.

Hi, indices access should now work without HTTP auth now. If the test cluster is working, I'll provision the prod cluster as well.

Mon, May 18, 9:10 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub

Fri, May 15

atsuko added a comment to T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.

@bd808 i'll roll out the new version without a requirement for authentication on Monday morning

Fri, May 15, 9:01 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
atsuko claimed T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.
Fri, May 15, 12:49 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
atsuko added a comment to T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.

It seems like it is possible to completely disable security plugin, this will disable the password requirement, as well as the double TLS. However, I don't think opensearch-operator support bootstrapping such clusters, here's options I was experimenting with.

Fri, May 15, 11:36 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub

Thu, May 14

atsuko reassigned T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s from atsuko to bd808.

@bd808 hi! the instance is available at
https://opensearch-toolhub-test.svc.eqiad.wmnet:30443/
https://opensearch-toolhub-test.svc.codfw.wmnet:30443/

Thu, May 14, 4:55 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
atsuko changed the status of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s from In Progress to Stalled.

Waiting for plugin release

Thu, May 14, 9:17 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization
atsuko changed the status of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s, a subtask of T424248: ☂️Migrate non-cirrus indices from production OpenSearch to OpenSearch on k8s ☂️, from In Progress to Stalled.
Thu, May 14, 9:17 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26)

Wed, May 13

atsuko added a comment to T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.

@dcausse and me ran mwscript-k8s --attach -- extensions/Translate/scripts/ttmserver-export.php --wiki=testwiki --ttmserver test --clean, it established the connection and complained on version 3.5.0 from ElasticSearchTtmServer.php:checkElasticsearchVersion(). @dcausse also advised to use Opensearch 2 for now and that we need to ship an additional plugin

Wed, May 13, 8:02 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization
atsuko updated subscribers of T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.

Connector config change is deployed, we can continue with exporting data to it. @dcausse mentioned that we need to enable additional extensions

Wed, May 13, 7:44 AM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization

Mon, May 11

atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Plan

  1. update openstream chart so it will have external httpd_cas port that is exposed to 30443
  2. we create all DNS records (public and private)
  3. we deploy the app, which sets up ingress
  4. we check that https://<service>.discovery.wmnet:30443 works
  5. we enable the ATS proxy and edge caching configuration
Mon, May 11, 2:56 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Going to push updates for turnilo and start converting eventstreams-internal

Mon, May 11, 2:17 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko edited P92445 Difference between turnilo + manually modularised httpd_cas vs new service + httpd_cas via sextant.
Mon, May 11, 12:45 PM
atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Result output is https://phabricator.wikimedia.org/P92446

Mon, May 11, 10:08 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko edited P92446 Difference between production status of turnilo and turnilo + modularised httpd_cas.
Mon, May 11, 9:55 AM
atsuko edited P92446 Difference between production status of turnilo and turnilo + modularised httpd_cas.
Mon, May 11, 8:49 AM
atsuko edited P92446 Difference between production status of turnilo and turnilo + modularised httpd_cas.
Mon, May 11, 8:39 AM

May 8 2026

atsuko created P92446 Difference between production status of turnilo and turnilo + modularised httpd_cas.
May 8 2026, 3:33 PM
atsuko created P92445 Difference between turnilo + manually modularised httpd_cas vs new service + httpd_cas via sextant.
May 8 2026, 3:23 PM
atsuko moved T424248: ☂️Migrate non-cirrus indices from production OpenSearch to OpenSearch on k8s ☂️ from In Progress to Blocked/Waiting on the Data-Platform-SRE (2026-04-24 - 2026-05-15) board.
May 8 2026, 1:44 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26)
atsuko moved T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel from Blocked/Waiting to In Progress on the Data-Platform-SRE (2026-04-24 - 2026-05-15) board.
May 8 2026, 1:42 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko changed the status of T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel from Open to In Progress.
May 8 2026, 1:41 PM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
atsuko added a comment to T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.

Plan:

  1. Take @JAllemandou diff for httpd_cas modularisation and finish it. For test, applying it and compare with functional deployment.
  2. Adapt httpd_cas for eventstreams chart and deploy next version of chart.
May 8 2026, 10:06 AM · Data-Platform-SRE (2026-04-24 - 2026-05-15), Data-Engineering (Q4 FS25/26 April 1st - June 30st), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform

May 6 2026

atsuko added a comment to T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s.

To-Do:

  1. Land the connector diff, consensus is that the change won't affect anything,
  2. Add the password to the connection based on @EBernhardson info (should be somewhere around mediawiki-config/private/readme.php based on https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Security_mitigations),
  3. Test if there any other code needed to get password working, and if there any network acls missing.
May 6 2026, 3:18 PM · Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), Patch-For-Review, Language and Product Localization