Page MenuHomePhabricator

dcausse (David Causse)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Jun 9 2015, 9:03 AM (554 w, 19 h)
Availability
Busy Busy until Jan 23.
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Thu, Jan 15

dcausse added a comment to T413969: Make semantic search accessible through Action API.

This will probably be a couple parts:

Thu, Jan 15, 10:50 AM · Semantic Search, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
dcausse closed T414066: Download enterprise structured content snapshots in hdfs as Resolved.

Dumps are available as text files (raw ndjson) under /wmf/data/discovery/wikimedia_enterprise/structured_content_snapshots/snapshot=$YYYYMMDD/project=${WIKI}_namespace_0 and will be updated weekly on Sundays.

Thu, Jan 15, 8:20 AM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search

Wed, Jan 14

dcausse claimed T414070: Chunk, trim and generate passage embeddings from enterprise structured content snapshots.
Wed, Jan 14, 1:45 PM · Patch-For-Review, Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search

Tue, Jan 13

dcausse merged T139647: Search box at top right of pages should italicize redirects into T303013: Indicate when search results are from redirects (sometimes).
Tue, Jan 13, 6:18 PM · Readers Essential Work 2025, Reader Experience Team, Patch-For-Review, Codex, Vector 2022
dcausse merged task T139647: Search box at top right of pages should italicize redirects into T303013: Indicate when search results are from redirects (sometimes).
Tue, Jan 13, 6:18 PM · CirrusSearch, patch-welcome, good first task, Discovery-ARCHIVED
dcausse added a project to T413969: Make semantic search accessible through Action API: Semantic Search.
Tue, Jan 13, 3:36 PM · Semantic Search, Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
dcausse added a comment to T414426: Migrate airflow dags from the Search Platform instance to Wikidata Platform.

@gmodena the list of dags seem correct to me, there'll be some parts of drop_old_data_daily.py that might be moved over as well (cleanups of rdf data from import_ttl and query analytics).

Tue, Jan 13, 11:35 AM · Wikidata-Query-Service (Current Sprint), Discovery-Search, Wikidata

Mon, Jan 12

dcausse added a comment to T414066: Download enterprise structured content snapshots in hdfs.

The full HTML snapshots are "Updated twice-monthly (on the 2nd and 21st)" (1) so I'm curious to know whether Structured Contents follows that same cadence.

Mon, Jan 12, 9:29 AM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search

Thu, Jan 8

dcausse moved T414066: Download enterprise structured content snapshots in hdfs from In Progress to Needs Review on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Thu, Jan 8, 6:44 PM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse claimed T414066: Download enterprise structured content snapshots in hdfs.
Thu, Jan 8, 4:01 PM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse edited projects for T414066: Download enterprise structured content snapshots in hdfs, added: Discovery-Search (2026.01.05 - 2026.01.30); removed Discovery-Search.
Thu, Jan 8, 3:30 PM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse edited projects for T414070: Chunk, trim and generate passage embeddings from enterprise structured content snapshots, added: Discovery-Search (2026.01.05 - 2026.01.30); removed Discovery-Search.
Thu, Jan 8, 3:30 PM · Patch-For-Review, Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse updated the task description for T414099: Decouple ALIS/SLIS search weighted tags handling.
Thu, Jan 8, 3:19 PM · Data-Engineering, Image-Suggestions
dcausse updated the task description for T414099: Decouple ALIS/SLIS search weighted tags handling.
Thu, Jan 8, 3:16 PM · Data-Engineering, Image-Suggestions
dcausse updated the task description for T414099: Decouple ALIS/SLIS search weighted tags handling.
Thu, Jan 8, 3:15 PM · Data-Engineering, Image-Suggestions
dcausse created T414099: Decouple ALIS/SLIS search weighted tags handling.
Thu, Jan 8, 3:13 PM · Data-Engineering, Image-Suggestions
dcausse updated the task description for T414095: Configure opensearch ML connectors/models.
Thu, Jan 8, 2:26 PM · Semantic Search, Discovery-Search
dcausse created T414095: Configure opensearch ML connectors/models.
Thu, Jan 8, 2:22 PM · Semantic Search, Discovery-Search
dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

@OKarakaya-WMF awesome thanks, p50 at 34ms is nice thanks! If I'm reading the numbers right it does seem like the prompt is not adding much overhead.
yes the prompt will be sent on every requests for now, if deemed necessary we could think about some named prompt templates but it's probably too early to think about that at this point.

Thu, Jan 8, 2:07 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
dcausse created T414091: Import passage vectors into opensearch.
Thu, Jan 8, 1:43 PM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse updated the task description for T414070: Chunk, trim and generate passage embeddings from enterprise structured content snapshots.
Thu, Jan 8, 11:30 AM · Patch-For-Review, Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse created T414070: Chunk, trim and generate passage embeddings from enterprise structured content snapshots.
Thu, Jan 8, 11:26 AM · Patch-For-Review, Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse updated the task description for T414066: Download enterprise structured content snapshots in hdfs.
Thu, Jan 8, 11:12 AM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse created T414066: Download enterprise structured content snapshots in hdfs.
Thu, Jan 8, 10:38 AM · Discovery-Search (2026.01.05 - 2026.01.30), Semantic Search
dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Hello @dcausse ,

Do we plan to query the api on prod with the following prompt?

Thu, Jan 8, 9:54 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Tue, Jan 6

dcausse closed T40403: Sortable search results as Resolved.

Support for the following sorting options:

  • incoming_links
  • last_edit
  • create_timestamp
  • title_natural (just recently)
Tue, Jan 6, 3:10 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Essential-Work, CirrusSearch
dcausse added a project to T403775: New search option: Sort results by page name: Advanced-Search.

We just deployed the required backend support to enable such sort options (title_natural_asc & title_natural_desc), re-adding the Advanced-Search tag.

Tue, Jan 6, 2:48 PM · User-notice, Patch-For-Review, Advanced-Search, Discovery-Search, Essential-Work, CirrusSearch, MediaWiki-Search, RoadToWiki
dcausse added a comment to T413794: [haswbstatement] wildcard for qualifiers doesn't do job.

Thanks for reporting this, I believe there are indeed a couple issues here:

  • the doc is clearly wrong when stating A wildcard can also be used when specifying qualifiers - to find all items that depict a cat of any color use haswbstatement:P180=Q146|P462=*, the use of | is used for something else.
  • the syntax haswbstatement:P180=Q146[P462=*] would indeed the one that makes the most sense but it does inhibit the prefix search because it expects * to be the last character
  • the syntax haswbstatement:P180=Q146[P462=* could possible work but we lower-case these terms internally and the use of the prefix query does not goes though the term analyzer (we'd have to use a normalizer here)
  • finally the syntax haswbstatement:p180=q146[p462=* (lowercasing manually) seems to work but it's definitely a poor workaround
Tue, Jan 6, 10:36 AM · Discovery-Search, CirrusSearch
dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

minor annoyance is that I can't seem to be able to propagate the host header from the opensearch connector config and had to hack the /etc/hosts file of the host running opensearch to make it work. I suspect that it won't be necessary once we move out of staging?

Tue, Jan 6, 8:59 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Please test this and let us know whether it's compatible with opensearch.

Tue, Jan 6, 8:36 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Mon, Jan 5

dcausse removed a project from T390293: Improve provenance tracking of CirrusSearch requests: Discovery-Search (2026.01.05 - 2026.01.30).
Mon, Jan 5, 4:38 PM · Essential-Work, CirrusSearch
dcausse moved T408431: Reindex all wikis from In Progress to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Mon, Jan 5, 4:36 PM · Discovery-Search (2026.01.05 - 2026.01.30), Essential-Work, CirrusSearch
dcausse moved T408399: Truncate labels.*.near_match fields from To be Deployed to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Mon, Jan 5, 4:34 PM · Discovery-Search (2026.01.05 - 2026.01.30), Essential-Work, CirrusSearch
dcausse moved T409218: Elastica\Exception\Connection\HttpException: Unknown error:52 from To be Deployed to Done on the Discovery-Search (2026.01.05 - 2026.01.30) board.
Mon, Jan 5, 4:34 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), MediaWiki-extensions-Translate, Wikimedia-production-error

Tue, Dec 23

dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
  1. Add a usage object that reports token counts. Is this field a hard requirement for your use case?

I don't think so, I believe that opensearch will simply ignore that part (I don't see anything in the codebase that suggests otherwise but I haven't tested to confirm), please feel free to ignore this requirement and we'll do some testing to confirm. thanks! :)

Tue, Dec 23, 9:32 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
dcausse added a comment to P86755 llama.cpp openai embedding output.

For the following input:

{"input": [ "text1", "text2" ]}
Tue, Dec 23, 9:23 AM
dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

@kevinbazira thanks, yes this seems like a format that opensearch would be able to work with (P86755 is what is working at the moment, we don't pass the model attribute because we currently use llama.cpp that does not support multi-model serving but we can pass it if required).

Tue, Dec 23, 9:22 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
dcausse created P86755 llama.cpp openai embedding output.
Tue, Dec 23, 9:12 AM

Mon, Dec 22

dcausse added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

@kevinbazira @OKarakaya-WMF thanks! is there a way to call this API in a way that is compatible with the openAI embedding format?
Regarding query size, qwen3 suggests a prompt that looks like this:

Instruct: Given a web search query, retrieve relevant passages that answer the query
Query:$user_query_here

Is there a chance that this prompt gets cached after multiple requests?

Mon, Dec 22, 9:14 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Dec 15 2025

dcausse created T412673: Glent generate_query_similarity_candidates fails with NPE.
Dec 15 2025, 9:40 AM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Dec 5 2025

dcausse updated the task description for T411871: Improve cirrus reindex orchestrator to limit its impact on k8s API response times.
Dec 5 2025, 3:18 PM · serviceops-radar, Discovery-Search, CirrusSearch
dcausse created T411871: Improve cirrus reindex orchestrator to limit its impact on k8s API response times.
Dec 5 2025, 3:11 PM · serviceops-radar, Discovery-Search, CirrusSearch

Dec 2 2025

dcausse added a comment to T408431: Reindex all wikis.

forgot to mention that the reindex was started yesterday on the two other clusters (eqiad, cloudelastic)

Dec 2 2025, 9:07 AM · Discovery-Search (2026.01.05 - 2026.01.30), Essential-Work, CirrusSearch
dcausse closed T408737: Enable Georgian Transliteration Second Try mappings for autocomplete as Resolved.
Dec 2 2025, 8:20 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse closed T408737: Enable Georgian Transliteration Second Try mappings for autocomplete, a subtask of T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis, as Resolved.
Dec 2 2025, 8:20 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)

Dec 1 2025

dcausse moved T404858: A/B test using defaultsort with the completion suggester from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Dec 1 2025, 3:42 PM · Discovery-Search (2026.01.05 - 2026.01.30), Patch-For-Review, MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse updated the task description for T404858: A/B test using defaultsort with the completion suggester.
Dec 1 2025, 3:42 PM · Discovery-Search (2026.01.05 - 2026.01.30), Patch-For-Review, MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse added a comment to T404858: A/B test using defaultsort with the completion suggester.

A/B test results on other wikis: https://people.wikimedia.org/~dcausse/T404858-completion-default-sort-2.html

Dec 1 2025, 3:41 PM · Discovery-Search (2026.01.05 - 2026.01.30), Patch-For-Review, MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse claimed T408431: Reindex all wikis.

Going to start the reindex today

Dec 1 2025, 10:46 AM · Discovery-Search (2026.01.05 - 2026.01.30), Essential-Work, CirrusSearch
dcausse created T411347: New CirrusSearch dumps are not properly formatted.
Dec 1 2025, 10:20 AM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch

Nov 27 2025

dcausse claimed T408737: Enable Georgian Transliteration Second Try mappings for autocomplete.

went with the approach of enabling on the five georgian wikis at once, please let me know if a more conservative approach (one wiki first) is preferable.

Nov 27 2025, 2:38 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse closed T410602: CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed as Resolved.

The update process has been fixed.
Existing stale data in the search index will get fixed when:

  • a new revision of the page is created
  • a template change propagates
  • when the continuous cleanup mechanism processes a page with stale data
Nov 27 2025, 2:07 PM · MW-1.46-notes (1.46.0-wmf.3; 2025-11-19), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse renamed T411169: Improve & better document cirrus debug & explainability APIs from Improve & better document cirrus debug & exaplainability APIs to Improve & better document cirrus debug & explainability APIs.
Nov 27 2025, 10:56 AM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
dcausse created T411169: Improve & better document cirrus debug & explainability APIs.
Nov 27 2025, 10:55 AM · Discovery-Search (2026.01.05 - 2026.01.30), CirrusSearch
dcausse closed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete, a subtask of T402864: Integrate RU & HE DWIM-style mappings into autocomplete, as Resolved.
Nov 27 2025, 8:18 AM · CirrusSearch, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.25; 2025-10-28), Essential-Work
dcausse closed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete as Resolved.
Nov 27 2025, 8:18 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse updated the task description for T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Nov 27 2025, 8:12 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Nov 26 2025

dcausse added a comment to T410758: Timeouts searching for terms and regular expressions too low.

Unless you noticed that that the regex got a lot slower recently and that more queries are timing out I think it is safer to keep the 15s internal timeout.

But Wikipedia content is more and more increasing and it will reach one time that 15s are not safer any more. What then? Shall we decrease the timeout more and more? I guess we need a search machine that is able to handle all the big content properly if CirrusSearch does not.

Nov 26 2025, 2:33 PM · Discovery-Search, CirrusSearch
dcausse merged T410965: Using the search field on mobile does not yield suggestions until after a space has been inserted into T393819: Codex TypeaheadSearch doesn't work with mobile keyboard and predictive text.
Nov 26 2025, 10:56 AM · Reader Experience Team (REx Sprint 12 [Q2 Dec 16 - Jan 26]), Readers Essential Work 2025 (Codex), Codex
dcausse merged task T410965: Using the search field on mobile does not yield suggestions until after a space has been inserted into T393819: Codex TypeaheadSearch doesn't work with mobile keyboard and predictive text.
Nov 26 2025, 10:56 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse added a comment to T410758: Timeouts searching for terms and regular expressions too low.

CirrusSearch has to be careful when specifying timeouts of a regex query.
Regex queries are particularly costly and may cause a lot of stress on the servers if not properly protected.
The 15s timeouts has been setup for this, to ensure that the search backend return before any other timeouts are applied otherwise this might mean that a costly query will continue to run outside of the concurrency protection (T152895).
Unless you noticed that that the regex got a lot slower recently and that more queries are timing out I think it is safer to keep the 15s internal timeout.

Nov 26 2025, 10:22 AM · Discovery-Search, CirrusSearch

Nov 24 2025

dcausse updated the task description for T410899: Improve CirrusSearch consistency checks.
Nov 24 2025, 4:25 PM · Discovery-Search, CirrusSearch
dcausse updated the task description for T410899: Improve CirrusSearch consistency checks.
Nov 24 2025, 3:31 PM · Discovery-Search, CirrusSearch
dcausse updated the task description for T410899: Improve CirrusSearch consistency checks.
Nov 24 2025, 3:28 PM · Discovery-Search, CirrusSearch
dcausse renamed T410899: Improve CirrusSearch consistency checks from Improve CirrusSearch consistancy checks to Improve CirrusSearch consistency checks.
Nov 24 2025, 3:22 PM · Discovery-Search, CirrusSearch
dcausse created T410899: Improve CirrusSearch consistency checks.
Nov 24 2025, 3:21 PM · Discovery-Search, CirrusSearch

Nov 20 2025

dcausse claimed T410602: CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed.
Nov 20 2025, 9:58 AM · MW-1.46-notes (1.46.0-wmf.3; 2025-11-19), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse added a comment to T410602: CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed.

Thanks for reporting this, I think there are two different issues that allowed such suggestions to appear:

  • defaultsort is indeed not properly removed from the search index when it's erased, a null value unfortunately tells the system to ignore it when updating it, this needs to be fixed for this field
  • defaultsort values are allowed to help completion only if they match a particular pattern, this pattern seems too permissive and should be corrected to limit the possibility of such vandalism to impact search suggestions in the future
Nov 20 2025, 8:46 AM · MW-1.46-notes (1.46.0-wmf.3; 2025-11-19), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Nov 19 2025

dcausse claimed T409218: Elastica\Exception\Connection\HttpException: Unknown error:52.
Nov 19 2025, 9:01 AM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), MediaWiki-extensions-Translate, Wikimedia-production-error
dcausse moved T410269: Wikibase CI broken: Unknown filter type [truncate_norm] for [truncate_keyword] from Incoming to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 19 2025, 8:41 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, ci-test-error (WMF-deployed Build Failure), Wikidata
dcausse edited projects for T410269: Wikibase CI broken: Unknown filter type [truncate_norm] for [truncate_keyword], added: Discovery-Search (2025.10.20 - 2025.12.31); removed Discovery-Search.
Nov 19 2025, 8:41 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, ci-test-error (WMF-deployed Build Failure), Wikidata
dcausse added a comment to T408431: Reindex all wikis.

Should be ready once 1.46.0-wmf.3 is deployed, earliest would be Thursday nov 20 but probably safer to wait til the following week in case we rollback.

Nov 19 2025, 8:22 AM · Discovery-Search (2026.01.05 - 2026.01.30), Essential-Work, CirrusSearch

Nov 18 2025

dcausse moved T404597: Eventutilities Flink: port SerDe tests from SUP from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 18 2025, 6:19 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering-Radar, Event-Platform, Data-Engineering, Essential-Work, CirrusSearch
dcausse claimed T404597: Eventutilities Flink: port SerDe tests from SUP.
Nov 18 2025, 1:47 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering-Radar, Event-Platform, Data-Engineering, Essential-Work, CirrusSearch
dcausse moved T406566: BadMethodCallException: MediaWiki\Session\SessionProvider::preventSessionsForUser must be implemented when canChangeUser() is false from Incoming to Blocked / Waiting on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 18 2025, 1:46 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.4; 2025-11-25), MediaWiki-Platform-Team (Radar), NetworkSession, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07), MediaWiki-Core-AuthManager, Wikimedia-production-error
dcausse added a comment to T408533: Initial task generation and ingestion to Cassandra and Search weight tags.

Hi @pfischer @dcausse, ML team wants to follow up on the initial ingestion process. As you mentioned before, the Search platform team has a manual script for this purpose. Can the ML team execute this on our end (e.g., in statbox)? Or can only the Search team execute it?

it is a bit cumbersome to run unfortunately and some adaptations have to be made (we only used it to backfill article countries). The script is in stat1009.eqiad.wmnet:~dcausse/articlecountry:

  • backfill_articlecountry.scala the spark job that reads hdfs://analytics-hadoop/user/dcausse/topic_model/wiki-region-groundtruth/regions-cirrus-upload.tsv.gz and convert it to classification.prediction.articlecountry weighted tags, this one would have to be adapted based on your source data
  • wiki.lst: the list of wikis to filter on
  • backfill.sh the shell script that orchestrates all this
Nov 18 2025, 1:28 PM · Discovery-Search (2025.10.20 - 2025.12.31), Machine-Learning-Team

Nov 17 2025

dcausse added a comment to T410269: Wikibase CI broken: Unknown filter type [truncate_norm] for [truncate_keyword].

Indeed, the new debian package wmf-opensearch-search-plugins version 1.3.20+12 has to be installed to run the lastest cirrus version. We generally maintain the cirrussearch-opensearch-image docker image that is used by MW developers and our cirrus integration test suite, but here I think that you install opensearch on the existing quibble image and thus refreshing this image with the new version of the plugin is indeed what should be needed.

Nov 17 2025, 3:31 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, ci-test-error (WMF-deployed Build Failure), Wikidata
dcausse moved T40403: Sortable search results from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 17 2025, 2:50 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Essential-Work, CirrusSearch
dcausse added a comment to T409898: Set up OpenSearch instance supporting vector search.

Do we need any specific plugins on this instance? At the moment, we're working on a minimal OpenSearch deployment, with no additional plugins, meant for the non-Search use cases.

Nov 17 2025, 2:02 PM · Data-Platform-SRE (2026.01.05 - 2026.01.23), Essential-Work, Discovery-Search, Research
dcausse reassigned T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete from dcausse to TJones.
Nov 17 2025, 1:46 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse closed T410007: upstream request timeout, http-status 504 in the API as Resolved.

This should be fixed, I can see the partial search response instead of the error.

Nov 17 2025, 1:30 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, MW-Interfaces-Team, MediaWiki-Action-API
dcausse updated the task description for T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Nov 17 2025, 10:03 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse added a comment to T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.

@TJones the change should be live on hewiki and ruwiki, could you draft a message for the tech news possibly by adding some text to https://meta.wikimedia.org/wiki/Tech/News/2025/48?

Nov 17 2025, 10:03 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse added a comment to T410007: upstream request timeout, http-status 504 in the API.

... there is now a component failing earlier than the allowed 50s.

How we can find out the component failing?

Nov 17 2025, 9:32 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, MW-Interfaces-Team, MediaWiki-Action-API
dcausse added a comment to T408223: Action API via rest-gateway production rollout.

We received notifications from users that the search API which is configured to allow 50s timeouts to support costly search requests is now failing at 15s with an upstream request timeout (T410007). The user reported that the behavior started to change around nov 11th which is apparently when we started to roll out this new route on group2 wikis. I'm not 100% sure that this change is the cause of this new behavior but IIUC on all wikis except enwiki we now route api.php requests to the rest-gateway. If I'm not mistaken the rest-gateway has a default timeout of 15s which might explain this new behavior? Are there ways to vary this timeout based on the target action API?

Nov 17 2025, 9:29 AM · OKR-Work, [MWI] FY2025-26 Q2, MW-Interfaces-Team (MWI-Roadmap)
dcausse added a comment to T410007: upstream request timeout, http-status 504 in the API.

Indeed, the internal timeout should be 50s to allow the regex to run. It is possible that something changed in the request flow that there is now a component failing earlier than the allowed 50s.

Nov 17 2025, 8:11 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, MW-Interfaces-Team, MediaWiki-Action-API

Nov 14 2025

dcausse added a comment to T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.

Would we also need to explicitly create the topics in main? Is auto topic creation enabled there?

Nov 14 2025, 5:35 PM · Data-Engineering, serviceops, Machine-Learning-Team
dcausse added a comment to T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.

If pushing to kafka-main you might need to increase broker's message.max.bytes see T344688.

Nov 14 2025, 3:19 PM · Data-Engineering, serviceops, Machine-Learning-Team
dcausse moved T40403: Sortable search results from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 14 2025, 2:49 PM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Essential-Work, CirrusSearch

Nov 13 2025

dcausse claimed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Nov 13 2025, 2:04 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Nov 7 2025

dcausse added a comment to T409070: Latest CirrusSearch is incompatible with ES7.10 and the corresponding WMF extra plugin.

It might be the only reasonable way is to remove anchored trigram support from REL1_45

Nov 7 2025, 8:33 AM · Discovery-Search (2026.01.05 - 2026.01.30), MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), CirrusSearch

Nov 6 2025

dcausse created P84987 completion suggester events with second try hits.
Nov 6 2025, 10:30 AM
dcausse closed T405475: Search for L7 shows incomplete drop-down box, a subtask of T379740: When searching by LID only the LID is shown, as Resolved.
Nov 6 2025, 8:13 AM · Abstract Wikipedia team, Essential-Work, Design, WikiLambda Front-end
dcausse closed T405475: Search for L7 shows incomplete drop-down box as Resolved.

I think this is now fixed, the behavior of items and lexemes should be the same.
The API response looks like this now (on L7 when searching for L7):

{
Nov 6 2025, 8:13 AM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata
dcausse updated the task description for T409397: Adapt EntityIdSearchHelper for Lexemes.
Nov 6 2025, 8:04 AM · Wikidata, Wikidata Lexicographical data
dcausse created T409397: Adapt EntityIdSearchHelper for Lexemes.
Nov 6 2025, 8:01 AM · Wikidata, Wikidata Lexicographical data

Nov 5 2025

dcausse moved T408431: Reindex all wikis from Incoming to Ready for Dev on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 5 2025, 2:36 PM · Discovery-Search (2026.01.05 - 2026.01.30), Essential-Work, CirrusSearch
dcausse moved T408154: AB Test doubling near match field weights on commonswiki from Incoming to In Progress on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 5 2025, 2:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse moved T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete from Incoming to Ready for Dev on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 5 2025, 2:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse moved T408737: Enable Georgian Transliteration Second Try mappings for autocomplete from Incoming to Ready for Dev on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 5 2025, 2:35 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work