Page MenuHomePhabricator

Make semantic search accessible through Action API
Open, Needs TriagePublic8 Estimated Story Points

Description

For the semantic search MVP on Android, we need an HTTP endpoint that they can consume to retrieve semantic search results. The Android app uses the Action API for both, prefix and full text search. For matters of convenience, we would reuse the Action API. Apparently, there are multiple discriminators already:

  • *gps* gps(search|limit|offset) = prefix search + generator=prefix
  • *gsr* gsr(search|limit|offset) = full text search + generator=fulltext

TBD: Do we need another param family or can we expand the possible values, for example, generator=semantic?

Alternatively might be solved via search profile, see fulltext query dependent profile.

Corresponding Android app task: T412986

AC:

  • w/api.php? can be called to fetch semantic results

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenEBernhardson
OpenNone
DeclinedNone
Resolvedbrouberol
OpenNone
OpenNone
DeclinedNone
OpenNone
Resolvedbking
OpenNone
Resolvedbking
Resolvedbking
OpenNone
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
OpenRKemper
DuplicateRKemper
OpenEBernhardson

Event Timeline

This is great - any kind of integration with the existing Action API would be equally simple for us to consume on the client side, whether it's adding something like generator=semantic, or something like generator=search with gsrwhat=semantic, whatever makes more sense architecturally on the backend.
And presumably this would also reuse the snippet field (returned by other types of searches) to provide the paragraph that's relevant to the semantic result.

pfischer set the point value for this task to 8.Jan 12 2026, 4:58 PM
EBernhardson subscribed.

This will probably be a couple parts:

  • Network / Infrastructure / Mostly SRE
    • A proper production config would involve putting LVS (load balancer) in front of the relforge cluster. We can potentially make it work without LVS, but it's not a typical production deployment. This requires SRE to do most of the heavy lifting.
    • Relforge additionaly has no TLS. Usually this is done as part of setting up LVS and giving all the instances a shared name they can respond to. We can configure CirrusSearch to use http, but that is not a typical production deployment. This would land on SRE as well.
    • Typical production search traffic flows through envoy. If we setup LVS/TLS this is easy. If we skip LVS/TLS we need to look into what is possible there, if it can be used without the typical production setup. Gut feeling is we would have to skip envoy and put the host list directly in CirrusSearch configuration.
    • Relforge today only allows connections from analytics networks, a relatively simple puppet patch can relax that to the rest of production.
    • Open up appropriate network rules such that mediawiki can talk to relforge. Today neither bare metal hosts (such as deployment2002) or mw-on-k8s (such as mwscript-k8s) can talk to relforge, all requests get filtered by firewalls. Plausibly fixing the relforge-side firewall will open up to the bare metal hosts, but mw-on-k8s probably needs additional egress rules.
    • Our other clusters are named things like eqiad-chi, we might need to assign a shortname like relforge-alpha or some such to keep consistent with naming, but we should not name it eqiad-relforge as that would imply relforge is part of the unified "CirrusSearch cluster" when it's actually a test system with very limited data and no streaming updates.
    • Relforge does not seem to show up in our grafana dashboards. Look into why and ensure we have visibility into the system. While the dashboards all say elasticsearch, they were simply never renamed. Particularly interesting dashboards:
      • elasticsearch-percentiles is often the first stop when receiving latency alerts to understand what might be going on.
      • elasticsearch-per-node-percentiles would be convenient for understanding latency profiles of the semantic search shard requests.
    • Similarly, relforge has no alerting. As an internal test system it never needed any. Most alerting comes from the same data that shows up on the dashboards, hopefully this is mostly tweaking existing alerts to also include relforge.
  • CirrusSearch
    • Need to define the new cluster in operations/mediawiki-config production services, and reference it from the cirrus cluster configuration.
    • The idea is to setup a ftqdprofile (full-text query-dependant profile) that selects for semantic search using the existing search profiles system.
      • We might ponder if we want to somehow limit access to this so someone doesn't see the new search profile in the api allowed-parameters and hammer the test service, but we don't have anything today that fulfills that requirement and can distinguish between mobile apps and arbitrary api traffic. Maybe a user-agent check and a warning response is sufficient. A determined user can always look at our source code/configuration unless we add some sort of secret key (unlikely, no precedent I'm aware of, and just doesn't feel very wiki).
    • Need to implement a semantic search querybuilder that assembles the appropriate query. This likely skips all the AST building that supports special syntax such as keywords. This can likely start out with relatively simple and un-abstracted query builder if we skip the AST and put the full query string directly into the k-nn query. This might be slightly awkward because users can still provide a ftqiprofile (full text query independant profile) to the api call, but we will ignore it.
      • This query builder should probably over-fetch results, as the update method allows for deleted pages to be returned. They will be filtered at the mediawiki rendering layer, but it may result in fewer results than desired.
    • Need to define a new PoolCounter (cluster-wide semaphore) limiting the number of parallel requests to this service and update CirrusSearch to use it when query_type is semantic.
      • If the whole cluster is falling over this is our primary lever to reduce traffic. Ideally we get enough information from capacity testing to set this at a value that doesn't need to be reduced once we go live.
      • If we setup LVS it should be able to gracefully handle one node failing over, but OpenSearch will also internally route requests to any node it knows about by-passing that limit. Since this cluster will be receiving uniform search queries it might make better use of ARS (adaptive replica selection) and avoid the issue (will also be interesting to learn from capacity testing).
    • May need to extend the profile system and/or query building and execution abstractions such that it can ignore ftqiprofile and skip the rescoring phase.
    • It seems likely the existing MediaWiki\Search\SearchResult can satisfy the requirements for this prototype, but we might find it needs changes.
    • The new profile will need to set the query_type to something indicating semantic search. That will flow into our cluster override system, which allows directing queries with a specific type or using specific features to a specified cluster (relforge).
    • Implementation is in-progress for transforming the enterprise structured content snapshots into passages and vectors. The intent is to import those to the relforge cluster once a week.
      • The import will use a production and fallback alias for the imported indices containing the last two completed imports. If the import "succeedes" but has problems an operator will need to use the OpenSearch alias API's to repoint the production alias at the fallback index. If the import fails the production alias will not be updated and we will continue serving the previous data.
      • The fallback index, if it exists, will be deleted prior to importing the next index. If there was a problem with the production index we assume it would have been alias-migrated by then. This helps keep down the disk and memory usage, otherwise we would have to have capacity for 3x index (and 3x replicas, and maybe 3x ceph replication as well).
      • The staleness of data has been determined to be acceptable. New articles will take some days to get into the dataset, but the rate of articles created (and not quickly deleted) is relatively low, articles gain importance and quality over time. Pages that have been deleted but which are still in the index will be filtered by the existing output layers.
      • We might want to have enough of this available, even just as a static file that can be used with gnu parallel / curl, to run include "QPS under indexing load" in capacity testing. We will likely setup timing on the import to run during the quietest part of the week (daytime over the pacific ocean on a weekend), but still.
  • The existing backend CirrusSearchRequestSet logging shouldn't need any changes, we will have a full log of queries, results, etc. for later analysis. As always this is a very "operational" dataset and is not the easiest to build analysis from, but it should have all relevant backend data. Mobile will need to manage their own testing and metrics as usual.
    • These logs all have a unique id per web request. That id is returned in the headers as X-Search-ID. If mobile wants to join their data collection against the backend logging they will need to record that value.
  • Open Questions
    • What do we do if relforge is getting hammered and rejecting requests? Can we lean on the existing tooling used for dealing with abusive traffic? That tooling is generally human-resource intensive in determining what to block.
    • Can relforge handle 5-10qps? Capacity evaluation to be done in T414623.
    • If relforge does fall over, whats the plan? Do we need a kill-switch, perhaps add a "banned profiles" configuration parameter to CirrusSearch?
      • Wikimedia\FeatureManager is new and can do this without a deploy, but is not stable and says "Please don't bind to it unless you absolutely need to."
      • Based on the use case, I don't think falling back to standard search is appropriate, we likely should return an error. We should double check with mobile/product.

@dcausse @CDanis @bking @RKemper The above plan is my first draft of how we go from where we are today, to having semantic search available for testing in production. Please review.

I wasn't sure where exactly to put this, ideally anyone could edit it. I tried using a phabricator paste but it renders terribly. We can maybe move it into the ticket description?

This will probably be a couple parts:

  • CirrusSearch
    • Need to define the new cluster in operations/mediawiki-config production services, and reference it from the cirrus cluster configuration.
    • The idea is to setup a ftqdprofile (full-text query-dependant profile) that selects for semantic search using the existing search profiles system.
      • We might ponder if we want to somehow limit access to this so someone doesn't see the new search profile in the api allowed-parameters and hammer the test service, but we don't have anything today that fulfills that requirement and can distinguish between mobile apps and arbitrary api traffic. Maybe a user-agent check and a warning response is sufficient. A determined user can always look at our source code/configuration unless we add some sort of secret key (unlikely, no precedent I'm aware of, and just doesn't feel very wiki).

We also discussed using an undocumented API param for now (cirrusEnableXYZ) but this is just buying a bit more time.
Cirrus has some facility to do the routing with DefaultSearchQueryDispatchService which could be helpful.

  • Need to implement a semantic search querybuilder that assembles the appropriate query. This likely skips all the AST building that supports special syntax such as keywords. This can likely start out with relatively simple and un-abstracted query builder if we skip the AST and put the full query string directly into the k-nn query. This might be slightly awkward because users can still provide a ftqiprofile (full text query independence profile) to the api call, but we will ignore it.

Agreed, I doubt there's a need to interpret the AST, in the long run we might perhaps honor double quote with an additional filter.

  • May need to extend the profile system and/or query building and execution abstractions such that it can ignore ftqiprofile and skip the rescoring phase.

If using the DefaultSearchQueryDispatchService a new "semantic_search" profile context will have to be created and this one will dictate what rescore profile to use and could be forced to "empty".
If a user explicitly selects a rescore profile (not using engine_autoselect for ftqiprofile) I think we could re-route to classic fulltext (the SearchQuery class keeps the list of forced profile and this can be inspected to make this decision).
Haven't looked closely but this could be an indication that using ftqbprofile to route to the knn query builder might not play well as it gives the false impression that you can assemble query bits freely. So perhaps using another criteria (custom cirrus param for now?) to trigger the knn search could be easier, the query router could cancel the knn query if some profiles have been selected explicitly (ftqiprofile/ftqbprofile != engine_autoselect).

  • Open Questions
    • What do we do if relforge is getting hammered and rejecting requests? Can we lean on the existing tooling used for dealing with abusive traffic? That tooling is generally human-resource intensive in determining what to block.

I think the android app will have to allow for graceful failures at this point.

@pfischer , @brouberol and myself have had a couple of discussions about this today. We believe the most efficient way forward is to run this on Kubernetes. We will create subtasks from this ticket and start dividing up the work.

We'll be doing many of the same tasks we already outlined in T414217 (alpha-testing the latest Kubernetes operator), but instead of deploying alpha software, we'll deploy the latest stable 2.8 tag of the operator , along with Balthazar's backported watchNamespace patch.

Per Erik's comment above:

elasticsearch-percentiles is often the first stop when receiving latency alerts to understand what might be going on.
elasticsearch-per-node-percentiles would be convenient for understanding latency profiles of the semantic search shard requests.

These important metrics are sourced from our custom exporter, which is not currently available in OpenSearch on K8s. (See T414345 for a further discussion). @EBernhardson / @pfischer , would you consider these metrics a hard requirement for the project? If so, we might have to:

A. (SRE) Modify the helm chart to run our custom exporter as a sidecar
B. (Search Platform) rewrite our custom exporter as an OpenSearch plugin.

Let us know if this is a hard requirement and if so, level of effort for implementing the exporter in Java as an OpenSearch plugin. SRE will also look into making a docker image for the exporter and how to integrate it into the chart if these metrics are indeed required.

Per Erik's comment above:

elasticsearch-percentiles is often the first stop when receiving latency alerts to understand what might be going on.
elasticsearch-per-node-percentiles would be convenient for understanding latency profiles of the semantic search shard requests.

This important metrics are sourced from our custom exporter, which is not currently available in OpenSearch on K8s. (See T414345 for a further discussion). @EBernhardson / @pfischer , would you consider these metrics a hard requirement for the project? If so, we might have to:

A. (SRE) Modify the helm chart to run our custom exporter as a sidecar
B. (Search Platform) rewrite our custom exporter as an OpenSearch plugin.

Let us know if this is a hard requirement and if so, level of effort for implementing the exporter in Java as an OpenSearch plugin. SRE will also look into making a docker image for the exporter and how to integrate it into the chart if these metrics are indeed required.

I did a quick look through the metrics exposed by prometheus-elasticsearch-exporter and prometheus-wmf-elasticsearch-exporter in the current prod clusters, i think the upstream exporter, already available in k8s iiuc, is a hard requirement. The wmf exporter feeds the per-node-percentiles graph and would be nice to have, but is not a requirement for deployment.

If using the DefaultSearchQueryDispatchService a new "semantic_search" profile context will have to be created and this one will dictate what rescore profile to use and could be forced to "empty".
If a user explicitly selects a rescore profile (not using engine_autoselect for ftqiprofile) I think we could re-route to classic fulltext (the SearchQuery class keeps the list of forced profile and this can be inspected to make this decision).
Haven't looked closely but this could be an indication that using ftqbprofile to route to the knn query builder might not play well as it gives the false impression that you can assemble query bits freely. So perhaps using another criteria (custom cirrus param for now?) to trigger the knn search could be easier, the query router could cancel the knn query if some profiles have been selected explicitly (ftqiprofile/ftqbprofile != engine_autoselect).

That all makes sense. Given the current constraints and timelines, a custom query param seems a reasonable way forward.

Change #1229643 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] [WIP] Introduce a Semantic Search query route

https://gerrit.wikimedia.org/r/1229643

Change #1229644 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Initial implementation of Semantic query builder

https://gerrit.wikimedia.org/r/1229644

Change #1229645 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] [WIP] Wire up semantic query building

https://gerrit.wikimedia.org/r/1229645