Page MenuHomePhabricator

DataHub is throwing errors on search
Closed, ResolvedPublic

Assigned To
Authored By
BTullis
Feb 6 2024, 3:41 PM
Referenced Files
F41794245: image.png
Feb 6 2024, 3:59 PM
F41794105: image.png
Feb 6 2024, 3:41 PM
F41794098: image.png
Feb 6 2024, 3:41 PM
Subscribers

Description

We have observed errors from the DataHub user interface which seem to be related to the search functionality.

Some browsing works, but searching and filtering returns 500 errors.

image.png (447×1 px, 36 KB)

image.png (448×1 px, 52 KB)

When looking at the logs from the datahub-gms components with kubectl logs -f datahub-gms-main-cbf4c689d-dx8z4 datahub-gms-main we can see something like this:

2024-02-06 15:38:22,914 [Thread-11403] ERROR c.l.m.s.e.query.ESSearchDAO:98 - Search query failed
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
	at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:92)
	at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:202)
	at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:123)
	at com.linkedin.metadata.search.client.CachingEntitySearchService.getRawSearchResults(CachingEntitySearchService.java:283)
	at com.linkedin.metadata.search.client.CachingEntitySearchService.lambda$getCachedSearchResults$0(CachingEntitySearchService.java:155)
	at com.linkedin.metadata.search.cache.CacheableSearcher.getBatch(CacheableSearcher.java:114)
	at com.linkedin.metadata.search.cache.CacheableSearcher.getSearchResults(CacheableSearcher.java:58)
	at com.linkedin.metadata.search.client.CachingEntitySearchService.getCachedSearchResults(CachingEntitySearchService.java:158)
	at com.linkedin.metadata.search.client.CachingEntitySearchService.search(CachingEntitySearchService.java:64)
	at com.linkedin.metadata.search.aggregator.AllEntitiesSearchAggregator.lambda$getSearchResultsForEachEntity$2(AllEntitiesSearchAggregator.java:174)
	at com.linkedin.metadata.utils.ConcurrencyUtils.lambda$transformAndCollectAsync$0(ConcurrencyUtils.java:31)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	at java.base/java.lang.Thread.run(Thread.java:829)
	Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://datahubsearch.svc.eqiad.wmnet:9200], URI [/domainindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index":"domainindex_v2_1707215385293","index_uuid":"QMkTItcUQKSlg1mQqJ7ZyA"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"domainindex_v2_1707215385293","node":"VUTqryoKRwmYBb2iRlKQ4g","reason":{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index":"domainindex_v2_1707215385293","index_uuid":"QMkTItcUQKSlg1mQqJ7ZyA"}}]},"status":400}
		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:326)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:296)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
		at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1632)
		... 16 common frames omitted

Event Timeline

BTullis triaged this task as Unbreak Now! priority.Feb 6 2024, 3:42 PM

I'm trying a rolling restart of the pods in codfw with the following command:

btullis@deploy2002:/srv/deployment-charts/helmfile.d/services/datahub$ helmfile -e codfw --state-values-set roll_restart=1 sync
BTullis lowered the priority of this task from Unbreak Now! to High.Feb 6 2024, 3:59 PM

The restart has worked and the search results now seem ok, but I'm not confident that this won't happen again.

image.png (887×1 px, 77 KB)

I think we need to keep investigating.

BTullis renamed this task from DataHub search is throwing errors on search to DataHub is throwing errors on search.Feb 8 2024, 5:42 PM
BTullis closed this task as Resolved.