Page MenuHomePhabricator

The preparation job should discover what index to write to
Closed, ResolvedPublic3 Estimated Story Points

Description

CirrusSearch partitions its document accross multiple indices. This information is stored in the mediawiki-config and can be requested using the cirrus-dump-config API.

The preparation job should provide a component that reads this API endpoint and decorates the update document with the name of the index the ingestion job will have to write to.

The information is rarely modified, so it can be cached for a long time.
The API request should only access the mediawiki-config and thus be pretty quick and it might be acceptable to not use the AsyncIO operator and have a blocking request here.
The schema defined at https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/856507 might be adapted to include a new field to store this information.

Event Timeline

Gehel triaged this task as High priority.Nov 21 2022, 4:25 PM
Gehel moved this task from needs triage to Current work on the Discovery-Search board.
Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.
Gehel set the point value for this task to 3.Nov 21 2022, 4:58 PM

Change 860607 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Add CirrusSearchConcreteReplicaGroup to the config-dump API

https://gerrit.wikimedia.org/r/860607

Change 860927 had a related patch set uploaded (by DCausse; author: DCausse):

[search/cirrus-streaming-updater@master] [WIP] Add CirrusNamespaceIndexMap

https://gerrit.wikimedia.org/r/860927

Change 860607 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add CirrusSearchConcreteReplicaGroup to the config-dump API

https://gerrit.wikimedia.org/r/860607

Change 867542 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Add index_name in the metadata of the cirrus build doc API

https://gerrit.wikimedia.org/r/867542

Change 868039 had a related patch set uploaded (by DCausse; author: DCausse):

[search/cirrus-streaming-updater@master] Propagate the index_name from UpdateRowEncoder

https://gerrit.wikimedia.org/r/868039

Change 860927 merged by jenkins-bot:

[search/cirrus-streaming-updater@master] Add CirrusNamespaceIndexMap

https://gerrit.wikimedia.org/r/860927

Change 868039 merged by jenkins-bot:

[search/cirrus-streaming-updater@master] Propagate the index_name from UpdateRowEncoder

https://gerrit.wikimedia.org/r/868039

Change 867542 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add index_name in the metadata of the cirrus build doc API

https://gerrit.wikimedia.org/r/867542