Took some research. Need to apply admin_ng/dse-k8s: create opensearch ClusterIssuer and edit private hieradata/role/common/deployment_server/kubernetes.yaml.
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Fri, Jun 5
@bd808 you can bypass SNI problems and terminate SSL with socat
socat TCP4-LISTEN:8000,reuseaddr,fork OPENSSL-CONNECT:127.0.0.130443,snihost=opensearch-toolhub-test.svc.eqiad.wmnet,verify=0
Thu, Jun 4
Tested proposed patch, it seems to unblock us.
I run the patch in MediaWiki-Docker and it seems that it created index, P93883.
For testing, you can try accessing production k8s instance:
ssh -L 30443:opensearch-toolhub.svc.eqiad.wmnet:30443 deployment.eqiad.wmnet -N
Then adding 127.0.0.1 opensearch-toolhub.svc.eqiad.wmnet to /etc/hosts.
Assigning to @Nikerabbit for further guidance on how we can proceed
Tue, Jun 2
Configured prod instance, per-DC URLs are
https://opensearch-toolhub.svc.eqiad.wmnet:30443/
https://opensearch-toolhub.svc.codfw.wmnet:30443/
Indices are accessible without pasword. Before:
atsuko@deploy1003:~$ curl -X PUT https://opensearch-ttmserver-test.svc.eqiad.wmnet:30443/ttmserver/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'; echo
{"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [indices:admin/settings/update] and User [name=opendistro_security_anonymous, backend_roles=[opendistro_security_anonymous_backendrole], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [indices:admin/settings/update] and User [name=opendistro_security_anonymous, backend_roles=[opendistro_security_anonymous_backendrole], requestedTenant=null]"},"status":403}After:
atsuko@deploy1003:~$ curl -X PUT https://opensearch-ttmserver-test.svc.codfw.wmnet:30443/ttmserver/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'; echo
{"acknowledged":true}
atsuko@deploy1003:~$ curl -X PUT https://opensearch-ttmserver-test.svc.eqiad.wmnet:30443/ttmserver/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'; echo
{"acknowledged":true}Accepted the invite, thanks!
@dcausse and me didn't proceed with roll-out after running checks on testwiki: even tho credentials were in the config, we were still getting unauthenticated access error from the client
[eqiad-test] => Array
(
[type] => ttmserver
[class] => ElasticSearchTTMServer
[shards] => 1
[index] => ttmserver-test
[cutoff] => 0.65
[writable] => 1
[use_wikimedia_extra] => 1
[config] => Array
(
[servers] => Array
(
[0] => Array
(
[host] => opensearch-ttmserver-test.svc.eqiad.wmnet
[transport] => CirrusSearch\Elastica\DeprecationLoggedHttps
[port] => 30443
[username] => opensearch
[password] => <REDACTED>
)Mon, Jun 1
Potential start time https://grafana.wikimedia.org/goto/cfnux6el9oruob?orgId=1 correlates with the release of new chart. I'm rolling back the timeout "increase" from "0s" (infinite) to "60s", as it seems that after canarying the change, the cluster is stable.
root@build2001:/srv/images/production-images# /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/production-images/config.yaml build images/ --select '*flink:2*' * docker-registry.discovery.wmnet/flink:2.0.2-wmf1-20260531
Confirmed that the access is already present, no change needed.
It seems that we forgot to update the control to openjdk-21, making a patch now.
Could you please re-check that you have the access to the tables if you do kinit wmf-ldlulisa.
Needed to create kerberos principal that matches the unixname, kmontalva-wmf. Created, coordinated with @KMontalva-WMF
Fri, May 29
There is already existing account wmf-ldlulisa created in T421214, with this level of privileges, but with different SSH key, Could you please clarify if you need to just update the SSH key.
There is already existing account kmontalva-wmf, but with different SSH key, with this level of privileges.
Could you please clarify if you need to just update the SSH key.
Do we need to update ttmserver as well?
atsuko@apt1002:~$ sudo -i reprepro -C component/opensearch2 include trixie-wikimedia $(realpath wmf-opensearch-search-plugins_2.19.5+6_amd64.changes ) Exporting indices... Deleting files no longer referenced... atsuko@apt1002:~$ sudo -i reprepro -C component/opensearch2 list trixie-wikimedia ... trixie-wikimedia|component/opensearch2|amd64: wmf-opensearch-search-plugins 2.19.5+6~trixie trixie-wikimedia|component/opensearch2|i386: wmf-opensearch-search-plugins 2.19.5+6~trixie ...
wikimedia-config backport is scheduled for Monday. If everything is going well, will start a new cluster and switchover production as well.
Slience id 7d0bff61-73e8-4e1f-b324-c107b5b54adc
Thu, May 28
Confirming it works for me, thanks!
Wed, May 27
Cleanup:
0. Check monitorings
- Remove eventstream-internal from main k8s
- Remove extra configs from turnilo and eventstream
- Downgrade eventstreams-internal.discovery.wmnet from lvs to ingress lb
I stepped from previous point to OATHAuth/modules/webauthn/util/Authenticator.js:39 and got this in console
dse-k8s service applied and working. Need to merge DNS and dyna configurations and it should be done.
Tue, May 26
Notes about apache-flink==2.0.2 bot working on python3.13 on trixie.
Wed, May 20
Got service working on dpe-k8s-eqiad, need to do:
- Register service with idp (Application Not Authorized to Use CAS)
- Finish diffs massaging and review, so far I
- edited some of the vendor templates
- didn't backport debug functionality yet
Tue, May 19
- Version check in ttmserver-export.php is released r/1286978.
- Need to fix MediaWiki\Extension\Translate\TtmServer\ElasticSearchTtmServer::getReplicaCount to work with current config wmf-config/CommonSettings.php where we use integer but it expects a string: Return value must be of type string, int returned.
- Need a release plan for switchover on production cluster, since adding the server without indices breaks the read operations, see T426467.
Mon, May 18
Created the namespaces in dse-k8s-eqiad, unblocked testing of the new chart/configuration.
Hi, indices access should now work without HTTP auth now. If the test cluster is working, I'll provision the prod cluster as well.
Fri, May 15
@bd808 i'll roll out the new version without a requirement for authentication on Monday morning
It seems like it is possible to completely disable security plugin, this will disable the password requirement, as well as the double TLS. However, I don't think opensearch-operator support bootstrapping such clusters, here's options I was experimenting with.
Thu, May 14
@bd808 hi! the instance is available at
https://opensearch-toolhub-test.svc.eqiad.wmnet:30443/
https://opensearch-toolhub-test.svc.codfw.wmnet:30443/
Waiting for plugin release
Wed, May 13
@dcausse and me ran mwscript-k8s --attach -- extensions/Translate/scripts/ttmserver-export.php --wiki=testwiki --ttmserver test --clean, it established the connection and complained on version 3.5.0 from ElasticSearchTtmServer.php:checkElasticsearchVersion(). @dcausse also advised to use Opensearch 2 for now and that we need to ship an additional plugin
Connector config change is deployed, we can continue with exporting data to it. @dcausse mentioned that we need to enable additional extensions
Mon, May 11
Plan
- update openstream chart so it will have external httpd_cas port that is exposed to 30443
- we create all DNS records (public and private)
- we deploy the app, which sets up ingress
- we check that https://<service>.discovery.wmnet:30443 works
- we enable the ATS proxy and edge caching configuration
Going to push updates for turnilo and start converting eventstreams-internal
Result output is https://phabricator.wikimedia.org/P92446
May 8 2026
Plan:
- Take @JAllemandou diff for httpd_cas modularisation and finish it. For test, applying it and compare with functional deployment.
- Adapt httpd_cas for eventstreams chart and deploy next version of chart.
May 6 2026
To-Do:
- Land the connector diff, consensus is that the change won't affect anything,
- Add the password to the connection based on @EBernhardson info (should be somewhere around mediawiki-config/private/readme.php based on https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Security_mitigations),
- Test if there any other code needed to get password working, and if there any network acls missing.