The output of T414070 should be imported into opensearch to allow a knn search.
The mapping is yet to be defined but should include the minimal requirements for cirrus to work:
- page_id
- namespace
- title
And the nested field to hold the vector:
"passage_chunk_embedding": { "type": "nested", "properties": { "text": { "type": "text", "index": false }, "passage_index": { "type": "short" }, "parent_sections": { "type": "text", "index": false }, "section_name": { "type": "text", "index": false }, "knn": { "type": "knn_vector", "dimension": 1024, "space_type": "l2", "mode": "on_disk", "method": { "name": "hnsw" } } } }
This is subject to changes depending on the needs and more fields could be added to possibly filter passages based on extra criteria (depth, passage size, passage type, presence of citations/references...).
AC:
- indices with a vector field are created and populated weekly (Sundays) on relforge
- index aliases are updated after the import process is done