Steps to replicate the issue (include links if applicable):
On https://memory-prime.wikibase.cloud/ type 'gashtor' in the search bar.
The search bar does not provide suggestions, even though 'Gashtor' item exists.
The search bar provides as a suggestion when searched with capital G.
Same case-sensitive behavior for when creating a statement: for example, entering 'wiki' in the property search bar doesn't suggest the 'Wikidata ID', but 'Wiki' does.
On https://anton12.wikibase.cloud/ when entering 'my city' into the search bar, it only offers the items where 'my city' is in the English label, the mul labels are ignored.
How search currently works:
Here are some notes on what the team remembers about how Elasticsearch works on Wikibase Cloud.
- The number of indexes used massively affects the performance of ES
- There seems to be an exponential increase, tens of indexes was okay, but around 200 it became really challenging
- ES took ages to start up when using lots of indexes, even with lots of resources - caused cascading issues due to nodes in our multi-node cluster things other nodes were down
- We introduced patches to WikibaseCirrusSearch that enables falling back to SQL Search in the case ES is not working
- To reduce the number of ES indexes, we add wikis as an alias in the two shared indexes - https://github.com/wbstack/api/pull/773
- there are two indexes that we share between all wikis (_content and _general), the prefix before these are usually different per wiki, with our shared indexes we use the same prefix for all wikis.
- MediaWiki defines some ES field mapping (a bit like a DB schema)
- We don't really care about the ES field mapping, however, we have to due to our hack of using shared indexes
- All wikis use the same field mapping which means we can't update the field mapping one wiki at a time
- CirrusSearch creates the first field mapping that all wikis alias to
Dev Notes:
tl;dr: Elasticsearch doesn't work for mul labels and aliases
- The ES field mapping need to be replaced with a newer version for searching on mul labels and aliases to work.
- CirrusSerach (or Elastica) knows what this field mapping should be via extension points that extensions like WikibaseCirrusSearch can hook into to define what can be searched.
- There are jobs to create this field mapping that we can manually trigger (see https://github.com/wmde/wbaas-deploy/blob/main/doc/search.md).
- Once we have updated the field mapping we will need to re-index all the Wikis.
- We could version the indexes and write to both new and old indexes until we have backfilled all data for all wikis into the new indexes.
- This will require us modifying our WBC aliasing logic
- forceSearchIndexFromTo.yaml is a k8s job that runs the CirrusSearch/maintenance/ForceSearchIndex.php MediaWiki job.
- elasticSearchInitJob.yaml is a k8s job that runs the CirrusSearch/maintenance/UpdateSearchIndexConfig.php MediaWiki job.
- this is possibly outdated
- doing this in a k8s job is "nicer" in that it won't get terminated if the mediawiki deployment is and the logs are easier to access
- ApiWbStackElasticSearchInit.php is an API module that runs CirrusSearch/maintenance/UpdateSearchIndexConfig.php
- this was last modified more recently than the elasticSearchInitJob.yaml is a k8s job
- this runs in a mediawiki pod rather than as a k8s job (which means it will also work in the docker env where k8s jobs can't run)
- Q: how long would it take to "just" re-index everything?
- We tried searching for any info from the last time we did an ES re-index but couldn't find any useful durations.
- Tom's guess is a week at most; depends on amount of data and speed of machines etc.
- The engineers would like to avoid creating any extra pressure on ourselves by "just" re-indexing. We would prefer to version the indexes.
- Q: should we spin up a separate ES cluster with the new index so that we don't need to update our existing MediaWiki jobs for (re-)indexing?
- we wouldn't be able to benefit from a partial re-index
- we think we will have to do a re-index regardless
- we haven't thought about having different prefixes in the same cluster before
- Q: if we spin up a separate ES cluster should we also move to OpenSearch?
- if it doesn't add too much complexity (i.e. don't do it if it requires major effort)
- Next steps
- Confirm steps to reproduce bug
- T415664: Test mul search locally in MediaWiki docker. If it does, then it indicates that it is the field mapping that needs updating.
Task Breakdown
(Most of these steps need to be done in order; Step 3 can be done before steps 1 and 2)
- T416155: 🗣️Create new ES cluster in all k8s environments with the name elasticsearch-3 using existing helm chart and container images (updating to new version of Elasticsearch or OpenSearch is out of scope)
- we might need to temporarily add more resources to staging and/or production to fit this new ES cluster
- our existing cluster requests 3 instances of master nodes @ 15m CPU and 8Gi RAM and 2 instances of data nodes (replicas) @ 100m cpu and 18Gi RAM (see production/elasticsearch-2.values.yaml.gotmpl); suggest we use the same for the new cluster
- on the borderline if we need more resources; we decided to spin up another two nodes for this migration
- two PRs per change, one for staging+local and one for production
- we might need to temporarily add more resources to staging and/or production to fit this new ES cluster
- T416156: 🗣️Create shared indexes on new cluster using elasticSearchInitJob.sh following the instructions from search.md#shared-index-creation
- Make sure that the MW_WRITE_ONLY_ELASTICSEARCH_HOST is the correct ES host
- Make sure that the CLUSTER_NAME is set to write-only
- T416157: 🗣️Update the Platform API so it also creates aliases for the shared indexes on new cluster for newly created Wikis. This functionality doesn't yet exist and will need to be added.
- When we move to Opensearch we also want this functinality.
- the ElasticSearchAliasInit Laravel job needs to take a Job Parameter (see ElasticSearchAliasInit.php#L20-L21) to specify the ES cluster host (domain name and the port but not the scheme e.g. elasticsearch-2.default.svc.cluster.local:9200)
- the new Job Parameter should be the 2nd required parameter
- existing calls to this Job will need to be updated to specify this new job parameter
- In the WikiController we need to call ElasticSearchAliasInit twice
- Use the elasticsearch_hosts variable and remove elasticsearch_cluster_without_shared_index and elasticsearch_shared_index_host https://github.com/wbstack/api/blob/b790ee100e11d78984a9b6d1f02b8377f0ce8a54/config/wbstack.php#L22
- T416158: 🗣️Create aliases for the shared indexes on new cluster for all existing wikis using the ElasticSearchAliasInit Laravel job - see search.md#manually-b
- run WikiController.php#L178-L180 in a loop for each ES host we have
- T416177: 🗣️Set the new cluster to be written to by MediaWiki by setting writeOnlyElasticsearch.host - see chart value
- T416178: 🗣️Re-index all the existing data into the new Elasticsearch cluster using the forceSearchIndexFromTo.sh script for each wikis. Note: the script defaults to running against all cluster - we should specify ONLY the new cluster.
- Place the domains in a file and iterate over them in a shell script loop.
- T416181: 🗣️Set the new cluster to be read/writeable in MediaWiki and the old cluster to write-only Swap the setting so that writeOnlyElasticsearch.host points to elasticsearch-2 and elasticsearch.host points to elasticsearch-3.
- T416182: 🗣️Decommission `elasticsearch-2` Test everything works, remove any settings referencing elasticsearch-2 and finally remove elasticsearch-2 cluster
Level of Effort (t-shirt size): Large
Once we have resolved the bug, we should make sure we have documented how we have configured Elasticsearch and how to update indexes while it is still fresh in our heads.