Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Thursday

  • No visible events.

User Details

User Since
Oct 7 2014, 4:49 PM (592 w, 6 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Thu, Feb 12

EBernhardson added a comment to T411169: Improve & better document cirrus debug & explainability APIs.

@EBernhardson thank you. Should it go out in Monday 16th's Tech News? (The wording suggests so, but I want to doublecheck)

Thu, Feb 12, 3:05 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Tue, Feb 10

EBernhardson renamed T417087: Deploy Qwen3-Reranker-0.6B inference service for semantic search reranking from Setup relatime inference via qwen3-reranker for semantic search to Deploy Qwen3-Reranker-0.6B inference service for semantic search reranking.
Tue, Feb 10, 10:04 PM · Discovery-Search (2026.02.02 - 2026.02.27), Semantic Search, CirrusSearch
EBernhardson created T417087: Deploy Qwen3-Reranker-0.6B inference service for semantic search reranking.
Tue, Feb 10, 10:03 PM · Discovery-Search (2026.02.02 - 2026.02.27), Semantic Search, CirrusSearch
EBernhardson moved T410440: Deepcat search stops loading additional results from To be Deployed to Done on the Discovery-Search (2026.02.02 - 2026.02.27) board.
Tue, Feb 10, 4:48 PM · Discovery-Search (2026.02.02 - 2026.02.27), MW-1.46-notes (1.46.0-wmf.13; 2026-01-27), CirrusSearch, Commons
EBernhardson added a comment to T411169: Improve & better document cirrus debug & explainability APIs.

I think the existing lead section should be short enough, something like:

Tue, Feb 10, 4:02 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Mon, Feb 2

EBernhardson added a project to T411169: Improve & better document cirrus debug & explainability APIs: User-notice.

Not entirely sure, but this might be reasonable to include in tech news.

Mon, Feb 2, 5:45 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Fri, Jan 30

EBernhardson added a comment to T401590: Adjust CirrusSearchNamespaceWeights for Commons.

Is the related wish resolved?

Fri, Jan 30, 6:26 PM · Community-Wishlist, Essential-Work, Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch, Community-Tech

Mon, Jan 26

EBernhardson merged T415429: Deepcategory only shows few results instead of the number of items it found into T410440: Deepcat search stops loading additional results.
Mon, Jan 26, 4:35 PM · Discovery-Search (2026.02.02 - 2026.02.27), MW-1.46-notes (1.46.0-wmf.13; 2026-01-27), CirrusSearch, Commons
EBernhardson merged task T415429: Deepcategory only shows few results instead of the number of items it found into T410440: Deepcat search stops loading additional results.
Mon, Jan 26, 4:34 PM · Discovery-Search, CirrusSearch, Commons
EBernhardson moved T410440: Deepcat search stops loading additional results from Needs Review to To be Deployed on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Mon, Jan 26, 4:04 PM · Discovery-Search (2026.02.02 - 2026.02.27), MW-1.46-notes (1.46.0-wmf.13; 2026-01-27), CirrusSearch, Commons

Fri, Jan 23

EBernhardson moved T411169: Improve & better document cirrus debug & explainability APIs from To be Deployed to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Fri, Jan 23, 4:51 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson added a comment to T414763: Wikimedia Commons deepcategory searches return unexpected results for categories with spaces in name.

Today I found a bug in cases where the category name has an "&".
Example: deepcat:"Pellerin & Cie" Try a deepcat search in deepcat:"Pellerin & Cie" > results in this search: deepcategory:"Pellerin_ No matches because of the wrong folder name. The folder name is truncated at the "&".
I know there must be a patch for this since my branched button that automates " " -deepcat:" " (such as "Pellerin & Cie" -deepcat:"Pellerin_&_Cie") has this a patch been applied already for a longer time and works well at this point time. The fork is hosted and patched by User Samwilson.
I hope the regular deepcat function can be patched for this issue without damage to the opposite -deepcat searches.

Fri, Jan 23, 2:41 PM · Discovery-Search, Commons, CirrusSearch

Thu, Jan 22

EBernhardson claimed T410440: Deepcat search stops loading additional results.
Thu, Jan 22, 8:34 PM · Discovery-Search (2026.02.02 - 2026.02.27), MW-1.46-notes (1.46.0-wmf.13; 2026-01-27), CirrusSearch, Commons
EBernhardson claimed T412673: Glent generate_query_similarity_candidates fails with NPE.

This is now running again. It has completed the generation stage and is currently creating fresh glent indices in opensearch. At a high level this looks like, but i don't have old enough data to verify 100%, it was caused by a change in meta.dt timestamps to now include millisecond precision, likely as part of T376026.

Thu, Jan 22, 5:36 PM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Wed, Jan 21

EBernhardson moved T410887: CirrusSearch API: Expose array of sections with paragraph markers from Needs Review to Blocked / Waiting on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Wed, Jan 21, 6:18 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, CirrusSearch
EBernhardson added a comment to T414763: Wikimedia Commons deepcategory searches return unexpected results for categories with spaces in name.

Indeed this seems solved. However, the following still doesn't work and I meant to create a separate issue about this just before this bug here appeared: is this a separate problem or the same?:

(this example is what's used to populate 2020s_maps_of_the_world_in_unidentified_languages which is how at least / starting with the most relevant world maps are categorized by language to e.g. better enable translations and hopefully eventually better search results that doesn't show maps in some niche language I can't read at the top when that's not in my configured language(s))

Wed, Jan 21, 6:15 PM · Discovery-Search, Commons, CirrusSearch
EBernhardson merged task T414763: Wikimedia Commons deepcategory searches return unexpected results for categories with spaces in name into T414859: Searching by category (deepcat) is broken.
Wed, Jan 21, 6:14 PM · Discovery-Search, Commons, CirrusSearch
EBernhardson merged T414763: Wikimedia Commons deepcategory searches return unexpected results for categories with spaces in name into T414859: Searching by category (deepcat) is broken.
Wed, Jan 21, 6:14 PM · RoadToWiki, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Tue, Jan 20

EBernhardson moved T414859: Searching by category (deepcat) is broken from To be Deployed to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.

Example query from description now works as expected

Tue, Jan 20, 9:21 PM · RoadToWiki, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson moved T414859: Searching by category (deepcat) is broken from Needs Review to To be Deployed on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Tue, Jan 20, 8:00 PM · RoadToWiki, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson moved T414859: Searching by category (deepcat) is broken from Incoming to Needs Review on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Tue, Jan 20, 6:18 PM · RoadToWiki, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson claimed T414859: Searching by category (deepcat) is broken.
Tue, Jan 20, 4:06 PM · RoadToWiki, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Jan 15 2026

EBernhardson added a comment to T413969: Make semantic search accessible through Action API.

If using the DefaultSearchQueryDispatchService a new "semantic_search" profile context will have to be created and this one will dictate what rescore profile to use and could be forced to "empty".
If a user explicitly selects a rescore profile (not using engine_autoselect for ftqiprofile) I think we could re-route to classic fulltext (the SearchQuery class keeps the list of forced profile and this can be inspected to make this decision).
Haven't looked closely but this could be an indication that using ftqbprofile to route to the knn query builder might not play well as it gives the false impression that you can assemble query bits freely. So perhaps using another criteria (custom cirrus param for now?) to trigger the knn search could be easier, the query router could cancel the knn query if some profiles have been selected explicitly (ftqiprofile/ftqbprofile != engine_autoselect).

Jan 15 2026, 6:43 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, Semantic Search, CirrusSearch
EBernhardson added a comment to T413969: Make semantic search accessible through Action API.

Per Erik's comment above:

elasticsearch-percentiles is often the first stop when receiving latency alerts to understand what might be going on.
elasticsearch-per-node-percentiles would be convenient for understanding latency profiles of the semantic search shard requests.

This important metrics are sourced from our custom exporter, which is not currently available in OpenSearch on K8s. (See T414345 for a further discussion). @EBernhardson / @pfischer , would you consider these metrics a hard requirement for the project? If so, we might have to:

A. (SRE) Modify the helm chart to run our custom exporter as a sidecar
B. (Search Platform) rewrite our custom exporter as an OpenSearch plugin.

Let us know if this is a hard requirement and if so, level of effort for implementing the exporter in Java as an OpenSearch plugin. SRE will also look into making a docker image for the exporter and how to integrate it into the chart if these metrics are indeed required.

Jan 15 2026, 3:38 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, Semantic Search, CirrusSearch

Jan 14 2026

EBernhardson updated subscribers of T413969: Make semantic search accessible through Action API.

@dcausse @CDanis @bking @RKemper The above plan is my first draft of how we go from where we are today, to having semantic search available for testing in production. Please review.

Jan 14 2026, 11:02 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, Semantic Search, CirrusSearch
EBernhardson claimed T413969: Make semantic search accessible through Action API.

This will probably be a couple parts:

Jan 14 2026, 11:00 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, Semantic Search, CirrusSearch
EBernhardson updated the language for P87519 Backend implementation plan for serving Semantic Search via Relforge in CirrusSearch from autodetect to remarkup.
Jan 14 2026, 9:50 PM
EBernhardson created P87519 Backend implementation plan for serving Semantic Search via Relforge in CirrusSearch.
Jan 14 2026, 9:50 PM
EBernhardson moved T413969: Make semantic search accessible through Action API from Incoming to In Progress on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Jan 14 2026, 7:37 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, Semantic Search, CirrusSearch
EBernhardson moved T411169: Improve & better document cirrus debug & explainability APIs from Needs Review to To be Deployed on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Jan 14 2026, 7:22 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Jan 13 2026

EBernhardson moved T411169: Improve & better document cirrus debug & explainability APIs from In Progress to Needs Review on the Discovery-Search (2026.01.05 - 2026.01.30) board.

Documentation has been placed: https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug

Jan 13 2026, 11:36 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson updated the title for P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug from Proposed Cirrus debug documentation to Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 13 2026, 11:35 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 13 2026, 10:37 PM
EBernhardson added a comment to T414345: Consider donating our time to enhance the official OpenSearch Prometheus exporter.

The latency buckets might be a little harder to upstream, but still possible. The latency buckets require special data collection implemented in our opensearch-extra plugin. Totally possible, but means upstreaming to both the main opensearch project and then the exporters once the data is available. It might require a slight re-architecture of the latency collection depending on what upstream thinks of the current implementation. We did propose this to ElasticSearch back in the day and they gave advise for implementing the latency bucket collectoin, but weren't interested in upstreaming.

Jan 13 2026, 6:29 PM · Data-Platform-SRE (2026-02-13 - 2026-03-06), Discovery-Search (2026.02.02 - 2026.02.27), Essential-Work

Jan 12 2026

EBernhardson moved T409070: Latest CirrusSearch is incompatible with ES7.10 and the corresponding WMF extra plugin from Needs Review to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.

This looks to now support ElasticSearch in REL1_45

Jan 12 2026, 9:22 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), CirrusSearch
EBernhardson moved T413970: Update links to CirrusSearch dumps from To be Deployed to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.

Turns out the problem with the content was a cache on my end, issuing a force-refresh loaded the new content. The link is now pointing to the new content.

Jan 12 2026, 9:19 PM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 12 2026, 8:02 PM

Jan 8 2026

EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 10:38 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 10:37 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 10:32 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 10:17 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 10:01 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 9:36 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 9:21 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 8:54 PM
EBernhardson claimed T411169: Improve & better document cirrus debug & explainability APIs.

Workin on the documentation at P86859. This is still very preliminary, but once it's all pinned down better it will end up somewhere on mediawiki.org

Jan 8 2026, 6:16 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 6:14 PM
EBernhardson edited P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 6:07 PM
EBernhardson updated the language for P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug from autodetect to remarkup.
Jan 8 2026, 4:21 PM
EBernhardson created P86859 Proposed Cirrus debug documentation - MOVED to https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug.
Jan 8 2026, 4:21 PM
EBernhardson added a comment to T411169: Improve & better document cirrus debug & explainability APIs.

As the person who filed T410602, it would have been nice to have an API taking in a search query and returning a set of results, as well as explaining why these results were chosen (e.g title, related, displaytitle, defaultsort, etc.) and their ranking (pageviews, etc.) Possibly unrelated, but a Wikitech/Mediawiki page containing a step-by-step guide on how the Search system works (as well as the function names/files involved in each step) would have been helpful as well, since I found the codebase to be labrynthine.

Jan 8 2026, 3:37 PM · User-notice, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson renamed T412444: Use OpenSearch for Special:LinkSearch from Use Elastic Serarch for Special:LinkSearch to Use OpenSearch for Special:LinkSearch.
Jan 8 2026, 3:33 PM · CirrusSearch, Discovery-Search, MediaWiki-Special-pages
EBernhardson created T414103: Mjolnir feature collection failing.
Jan 8 2026, 3:30 PM · Discovery-Search (2026.02.02 - 2026.02.27)

Jan 7 2026

EBernhardson moved T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script from Needs Review to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.

Deprecation doc has been placed, this should be complete.

Jan 7 2026, 9:59 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch
EBernhardson moved T411347: New CirrusSearch dumps are not properly formatted from Needs Review to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.

Patches shipped, 20260104 dump was rerun and looks reasonable. I imported the simplewiki dump into a local instance and it loaded without issues.

Jan 7 2026, 2:48 PM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
EBernhardson added a comment to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script.

I went through today to verify if everything is ready to go:

Jan 7 2026, 2:28 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch

Jan 6 2026

EBernhardson edited P86770 Proposed DEPRECATION doc for old cirrus dumps.
Jan 6 2026, 9:43 PM
EBernhardson created P86770 Proposed DEPRECATION doc for old cirrus dumps.
Jan 6 2026, 9:38 PM
EBernhardson added a comment to T204089: CirrusSearch: Add filter for exclusion of redirects or finding only them.

Something can be done to improve some of the use cases around redirects, but we would need to narrow in on things that can be done. The fundamental limitation here is that in the search data model redirects are not their own pages. The only metadata about redirects stored in the search index is the namespace and title, and that is attached to the page that is redirected to. Additionally search results are always at the granularity of the indexed documents. This means if two redirects to the same page match that can not be represented in the output as two matches. It will always be a match against the document that was redirected to, with a scoring bump for matching twice.

Jan 6 2026, 4:17 PM · Advanced-Search, Discovery-Search, CirrusSearch

Jan 5 2026

EBernhardson added a comment to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script.

Should we start by disabling the legacy cirrussearch dumps in the Airflow UI?
https://airflow-test-k8s.wikimedia.org/dags/mediawiki_cirrussearch_dump/grid

image.png (933×2 px, 325 KB)

If nothing falls over and nobody complains after a couple of weeks, then we can remove the code from Airflow-DAGs.

Jan 5 2026, 5:17 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch
EBernhardson removed a project from T412471: Search result excerpts can overflow the page on Special:Search: Discovery-Search.
Jan 5 2026, 4:51 PM · MediaWiki-Search
EBernhardson closed T411107: Harmonize semantics of Cirrus dump timestamp as Declined.

After pondering this one, we think it's best to leave the timestamps as-is. The only "correct" timestamp is the publicly facing dump, changing and re-aligning all the internal timestamps is a good bit of work that seems helpful but ultimately unnecessary.

Jan 5 2026, 4:50 PM · Discovery-Search, CirrusSearch

Dec 5 2025

EBernhardson claimed T410887: CirrusSearch API: Expose array of sections with paragraph markers.
Dec 5 2025, 9:33 PM · Discovery-Search (2026.02.02 - 2026.02.27), Patch-For-Review, CirrusSearch

Dec 3 2025

EBernhardson moved T408154: AB Test doubling near match field weights on commonswiki from Needs Review to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Dec 3 2025, 3:12 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Dec 2 2025

EBernhardson closed T410886: GeoData wikivoyage queries return some results without coordinates as Declined.

I think thiemo has got it pinned down, this is "working as intended". It would be convenient if the mediawiki api had a more intuitive method to align limits across the api call, but that's not currently a thing.

Dec 2 2025, 8:26 PM · Discovery-Search (2025.10.20 - 2025.12.31), GeoData
EBernhardson claimed T411347: New CirrusSearch dumps are not properly formatted.

The code fix itself ended up being pretty straight forward. We might use this opportunity to re-run the most recent dump, learn a bit more about how replacing an already published dump would work.

Dec 2 2025, 8:19 PM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Dec 1 2025

EBernhardson moved T409218: Elastica\Exception\Connection\HttpException: Unknown error:52 from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Dec 1 2025, 9:50 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), MediaWiki-extensions-Translate, Wikimedia-production-error
EBernhardson moved T408154: AB Test doubling near match field weights on commonswiki from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.

It seems worthwhile to let the test continue running for a second week

@EBernhardson it's now two weeks later. Just in case this dropped off the radar by accident.

Dec 1 2025, 8:01 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Nov 24 2025

EBernhardson added a comment to T409898: Set up OpenSearch instance supporting vector search.

A temporary 3-node cluster has been stood up in T410681. This is running opensearch 3.3.2 and is accessible from the analytics network (stat machines, hadoop, etc.).

Nov 24 2025, 5:09 PM · Data-Platform-SRE (2026-02-13 - 2026-03-06), Essential-Work, Discovery-Search, Research

Nov 21 2025

EBernhardson moved T410681: Setup opensearch 3 on relforge servers from Incoming to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.

The initial parts of this are complete. The three node cluster has been stood up, and it is accessible from the analytics networks. As a test instance we didn't setup a dns service or tls, but i suspect that is acceptable. The cluster can be accessed on relforge1008, relforge1009, and relforge1010 on port 9200.

Nov 21 2025, 9:10 PM · Discovery-Search (2025.10.20 - 2025.12.31)
EBernhardson created P85447 relforge opensearch 3 knn test docker-compose.yml.
Nov 21 2025, 9:09 PM

Nov 20 2025

EBernhardson created P85427 validate articletopic terms.
Nov 20 2025, 10:32 PM
EBernhardson added a comment to T403212: Support \r, \n, \t, and \uNNNN in insource and intitle queries.

Hmm, that does seem likely. If we add a &cirrusDumpQuery to one of the searches we can see it has timeout: 15s, when indeed regex should get a longer timeout. Not sure yet what changed to cause that.

Nov 20 2025, 9:33 PM · User-notice-archive, Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch
EBernhardson added a comment to T403212: Support \r, \n, \t, and \uNNNN in insource and intitle queries.
Nov 20 2025, 8:09 PM · User-notice-archive, Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch
EBernhardson created T410681: Setup opensearch 3 on relforge servers.
Nov 20 2025, 6:02 PM · Discovery-Search (2025.10.20 - 2025.12.31)

Nov 19 2025

EBernhardson added a comment to T408133: [Spike] Explore Generalizing Enrollment Authorities.

Regarding CirrusSearch: Do you already have particular open questions/obstacles related to the A/B test scenarios covered in the backend?

  • Session starting by following a link to blank Special:Search
  • Session starting by following a link to Special:Search with a query
  • Session starting at Special:Search with a 'go'

Thanks @pfischer. A couple of questions spring to mind:

  1. Is Scenario 1 treated the same as the other scenarios? In my mental model, a user searching or interacting with the search autocomplete starts a session. Is that correct?
Nov 19 2025, 1:40 PM · Test Kitchen (Experiment Platform Sprint 16), MW-1.46-notes (1.46.0-wmf.4; 2025-11-25), OKR-Work

Nov 18 2025

EBernhardson added a comment to T410440: Deepcat search stops loading additional results.

Shouldn't have anything to do with time between requests, Deep category search SPARQL query failed mostly means one of the backend services had an error and the frontend should retry. It looks like instead the frontend is treating an error as the end of results.

Nov 18 2025, 7:32 PM · Discovery-Search (2026.02.02 - 2026.02.27), MW-1.46-notes (1.46.0-wmf.13; 2026-01-27), CirrusSearch, Commons

Nov 17 2025

EBernhardson added a comment to T403212: Support \r, \n, \t, and \uNNNN in insource and intitle queries.

@EBernhardson apologies for bumping this, but do you think it might be worth me filing a follow-up feature request for this, given the similar usage/support in other regex flavours/libraries?

Nov 17 2025, 7:31 PM · User-notice-archive, Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch

Nov 14 2025

EBernhardson added a comment to T403212: Support \r, \n, \t, and \uNNNN in insource and intitle queries.

I noticed that surrogate pairs can't be used inside a character class. For example, both insource:/😂/ and insource:/[😂]/ work, but insource:/[\uD83D\uDE02]/ returns nothing. Using it in character classes can be handy when searching for a range of Unicode code points. Shall I file a new ticket for this?

Nov 14 2025, 2:40 PM · User-notice-archive, Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch

Nov 13 2025

EBernhardson added a comment to T408154: AB Test doubling near match field weights on commonswiki.

Test has been out for a week, ran the notebook but results are curious. In particular we are seeing a significant change in ZRR, even though the test treatment does not change the retrieval function. This suggests we could have some unbalanced effects in the bucketing. It seems worthwhile to let the test continue running for a second week and run the notebook against the second week to verify the results.

Nov 13 2025, 8:34 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Nov 12 2025

EBernhardson merged T317442: Sanity-check indices before promotion into T363521: Completion suggester can promote a bad build.
Nov 12 2025, 8:16 PM · Essential-Work, Discovery-Search (2025.08.15 - 2025.09.05), MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), Sustainability (Incident Followup), CirrusSearch
EBernhardson merged task T317442: Sanity-check indices before promotion into T363521: Completion suggester can promote a bad build.
Nov 12 2025, 8:16 PM · Discovery-Search
EBernhardson added a comment to T409218: Elastica\Exception\Connection\HttpException: Unknown error:52.

Checking the last 2 weeks of logged Elastica HttpException there were 5 total exceptions, fairly low volume.

Nov 12 2025, 7:43 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), MediaWiki-extensions-Translate, Wikimedia-production-error

Nov 7 2025

EBernhardson updated the title for P85094 The CirrusSearch index dumps are moving from The cirrus index dumps are moving! to The CirrusSearch index dumps are moving.
Nov 7 2025, 4:37 PM
EBernhardson edited P85094 The CirrusSearch index dumps are moving.
Nov 7 2025, 4:36 PM
EBernhardson added a comment to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script.

In the communication we went with promising dumps through november, shutting off sometime in december:

Nov 7 2025, 4:36 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch
EBernhardson edited P85094 The CirrusSearch index dumps are moving.
Nov 7 2025, 4:27 PM
EBernhardson edited P85094 The CirrusSearch index dumps are moving.
Nov 7 2025, 4:23 PM
EBernhardson edited P85094 The CirrusSearch index dumps are moving.
Nov 7 2025, 4:18 PM
EBernhardson created P85094 The CirrusSearch index dumps are moving.
Nov 7 2025, 2:49 PM

Nov 6 2025

EBernhardson added a comment to T409218: Elastica\Exception\Connection\HttpException: Unknown error:52.

Curl error 52 is "Empty reply from server."

Nov 6 2025, 7:02 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), MediaWiki-extensions-Translate, Wikimedia-production-error
EBernhardson added a comment to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script.

@EBernhardson if you're happy with these new dumps, do you still want the "old" cirrussearch dumps to run on Airflow?

Nov 6 2025, 7:00 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch
EBernhardson added a comment to T409070: Latest CirrusSearch is incompatible with ES7.10 and the corresponding WMF extra plugin.

There is still a remaining problem with the query-time highlighter. The elasticsearch highlighter doesn't support the lucene_anchored flavor so we would need to always request lucene on elasticsearch. We are really trying to avoid extra query-time round trips to the server to determine the version information though, still pondering appropriate solution. It might be the only reasonable way is to remove anchored trigram support from REL1_45

Nov 6 2025, 5:38 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), CirrusSearch
EBernhardson moved T405466: Upgrade WebdriverIO to v9 in CirrusSearch from Needs Review to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 6 2025, 5:28 PM · Essential-Work, MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Nov 4 2025

EBernhardson claimed T409070: Latest CirrusSearch is incompatible with ES7.10 and the corresponding WMF extra plugin.

Talked this over with @dcausse. We agreed we should continue to support Elasticsearch in REL1_45. We are adding a workaround for this bug with the regex support, and will add warnings that will be displayed when running scripts that manage search indexes whenever the indexes exist on an elasticsearch instance. The intention is to only support OpenSearch in REL1_46 and beyond.

Nov 4 2025, 10:04 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), CirrusSearch
EBernhardson moved T408678: `ForceSearchIndex` maintenance script falsly reports indexed pages when indexing jobs are skipped from Needs Review to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 4 2025, 10:02 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
EBernhardson added a comment to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script.

With puppet deployed should expect to see these arrive at https://dumps.wikimedia.org/other/cirrus_search_index/ after 05:00 UTC tomorrow.

Nov 4 2025, 5:22 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch
EBernhardson moved T408909: The cirrus config dump API may produce unexpected json output from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 4 2025, 4:46 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
EBernhardson added a comment to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script.

First run of the updated dag completed, dumps were formatted and moved to the exports path in hdfs. Reviewing the output it all looks reasonable and as expected. Next up is to enable the public sync via the puppet patch.

Nov 4 2025, 4:19 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch
EBernhardson moved T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 4 2025, 4:18 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Discovery-Search (2026.01.05 - 2026.01.30), MediaWiki-Page-derived-data, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch