User Details
- User Since
- Dec 15 2021, 9:19 PM (233 w, 3 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- BKing (WMF) [ Global Accounts ]
Fri, May 29
Per this Slack conversation, it sounds like @colewhite is evaluating Datadog's vector . Cole, I'm not sure how far along you are in your evaluation, but this seems like a potential use case for Vector. I'm happy to help test if you are in fact working on vector stuff. CC @brouberol as he has some history with Datadog products ;)
Thu, May 28
I'll take a closer look tomorrow, but from what I can tell, the secret is being used correctly. I'm comparing the Airflow helm chart's handling of secrets against the secret creation in your linked CR , and they look identical.
Wed, May 27
Working on relforge, I've noticed the cluster won't form unless cluster.initial_cluster_manager_nodes and`discovery.seed_hosts` are set. This matches up with the OpenSearch docs , but it doesn't match up with our current "discovery" (cluster formation) settings in production .
Tue, May 26
Per today's discussion at DPE SRE standup, there are concerns about Terraform introducing too much complexity. We also discussed the possibility of managing these settings with a Kubernetes operator in the future.
Fri, May 22
Thu, May 21
I haven't had time to progress this ticket for a few weeks, moving back to backlog...
Wed, May 20
We ran into this bug when running sre.elasticsearch.rolling-operation today. There is a pretty easy workaround posted in the Github issue, but we'll need to document that and possibly update our cookbooks to work around it.
NP, thanks for your help on this! Feel free to ping me in IRC (inflatador) if you need anything else.
I've created a tentative plan here, feel free to look over and offer suggestions
Hello @Jhancock.wm , per above I have requested one in each row, avoid row D if possible since it already has 4 wdqs-main hosts.
Tue, May 19
Moving to in progress per IRC conversation with @Jclark-ctr . These are our first servers with the hardware profile Config F single CPU, but with smaller NVME drives (2x 2TB). Traffic's cp (CDN) servers use almost the same profile, so we should be able to borrow their partman config. I'm checking now and should have a patch up within the next day or so.
Mon, May 18
Per today's Search Platform standup, we don't have the bandwidth to contribute at the moment. As such, I'll close this out. We can always revisit as time permits.
Sat, May 16
This is complete. Closing...
Fri, May 15
Thu, May 14
@bd808 I believe we are forced by the OpenSearch operator to use basic and/or mutual TLS auth. I'll check again and have an answer for you by this time next week.
Wed, May 13
@ayounsi Sorry for the trouble, confirming that the depool and repool commands are enough for cirrussearch hosts.
Tue, May 12
^^ Basically what he said 🙃
- Data Platform sets up new containerized opensearch cluster for Toolhub use Done, see this page for how to access
Closing in favor of T421293, which has more concrete steps to improve performance (move service into mesh). Feel free to reopen this task if that's not acceptable.
Tentative maintenance plan:
- Merge changes to deployment-charts that include the OpenSearch 3.x image
- Failover opensearch-ipoid to EQIAD only using DNS discovery (DPE SRE)
- Upgrade to OpenSearch 3.x in CODFW (DPE SRE)
- Confirm application is working on OpenSearch 3.x (PSI)
- Repeat in EQIAD
Mon, May 11
I've merged everything in our environment, and submitted some changes upstream. I don't think we need to wait for the upstream changes to be reviewed, so I'll close this one out for now.
May 7 2026
Thanks @bd808 , I'll take a look for sure.