In order to run a knn query we need to configure opensearch to access the embedding services.
This was currently done manually but we should find a place to store this configuration with possibly some scripts to be able to replicate this configuration.
The current approach is to use an external embedding service powered by liftwing (T412338).
**Allow remote models**
`curl -XPUT -d @- -Hcontent-type:application/json /_cluster/settings`
```lang=json
{
"persistent": {
"plugins.ml_commons.trusted_connector_endpoints_regex": [
"^https://.*\\.wmnet:.*$",
"^https://.*\\.wikimedia.org:.*$"
],
"plugins.ml_commons.connector.private_ip_enabled": false
}
}
```
NOTE: can/should this be set in `opensearch.yml`?
**Connector config**
`curl -XPOST -d @- -Hcontent-type:application/json /_plugins/_ml/connectors/_create`
```lang=json
{
"name": "liftwing_qwen3",
"description": "Qwen3 embeddings via liftwing",
"version": 1,
"protocol": "http",
"parameters": {
"endpoint": "inference.svc.eqiad.wmnet:30443"
},
"credential": {
"key": "unused"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"headers": {
"content-type": "application/json",
"host": "embeddings.llm.wikimedia.org"
},
"url": "https://${parameters.endpoint}/v1/models/qwen3-embedding:predict",
"request_body": "{ \"input\": ${parameters.input} }",
"pre_process_function": "connector.pre_process.openai.embedding",
"post_process_function": "connector.post_process.openai.embedding"
}
]
}
```
**models group**
`curl -XPOST -d @- -Hcontent-type:application/json /_plugins/_ml/model_groups/_register`
```lang=json
{
"name": "liftwing",
"description": "A model group for liftwing powered models"
}
```
**model**
`curl -XPOST -d @- -Hcontent-type:application/json /_plugins/_ml/models/_register`
```lang=json
{
"name": "liftwing_qwen3",
"function_name": "remote",
"model_group_id": "PLACEHOLDER",
"description": "Qwen3 1024d embedding model via liftwing",
"connector_id": "PLACEHOLDER""
}
```
**model deploy**
`curl -XPOST /_plugins/_ml/models/PLACE_HOLDER/_deploy`
Where to add such tooling is up for discussion but it could go to https://gitlab.wikimedia.org/repos/search-platform/cirrus-toolbox if no better place is available.
AC:
* opensearch can be setup to run external embedding services with liftwing