Page MenuHomePhabricator

Import passage vectors into opensearch
Open, Needs TriagePublic5 Estimated Story Points

Description

The output of T414070 should be imported into opensearch to allow a knn search.

The mapping is yet to be defined but should include the minimal requirements for cirrus to work:

  • page_id
  • namespace
  • title

And the nested field to hold the vector:

"passage_chunk_embedding": {
  "type": "nested",
  "properties": {
    "text": {
      "type": "text",
      "index": false
    },
    "passage_index": { "type": "short" },
    "parent_sections": { "type": "text", "index": false },
    "section_name": { "type": "text", "index": false },
    "knn": {
      "type": "knn_vector",
      "dimension": 1024,
      "space_type": "l2",
      "mode": "on_disk",
      "method": {
         "name": "hnsw"
      }
    }
  }
}

This is subject to changes depending on the needs and more fields could be added to possibly filter passages based on extra criteria (depth, passage size, passage type, presence of citations/references...).

AC:

  • indices with a vector field are created and populated weekly (Sundays) on relforge
  • index aliases are updated after the import process is done