Page MenuHomePhabricator

Search gives an errormessage instead of search results
Closed, ResolvedPublic

Description

Search went down on all wikis, we are working to bringing it up again. Search functionality has been disabled to avoid complications.

Original report:

When performing a search on NL wiki or Commons instead of results (or a message there are no results) I get an errormessage stating the search function is too busy. "Er is een fout opgetreden tijdens het zoeken: De zoekfunctie heeft het op dit moment heel druk. Probeer het later opnieuw." (An error occured during searching: The searchfunction is very busy at the moment. Try again later.) On Dutch Wikipedia other users are also complaining about the same problem. Problem occurs when you use a search term that isn't the name of an existing article. (Wereldelectricteitsgebruik gives the error message, Duitsland gives the Dutch article about Germany)

Event Timeline

Mbch331 raised the priority of this task from to Needs Triage.
Mbch331 updated the task description. (Show Details)
Mbch331 added a project: CirrusSearch.
Mbch331 subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mbch331 set Security to None.

The search function currently isn't available on all wmf wikis, SRE currently works on a solution to bring Elasticsearch back up.

jcrespo triaged this task as Unbreak Now! priority.Jun 15 2015, 10:47 AM
jcrespo subscribed.

Update: We have identified the problem and things seems going back to normality in terms of backend servers, but search functionality has to continue being disabled until we can guarantee it won't go down again.

Update: We have identified the problem and things seems going back to normality in terms of backend servers, but search functionality has to continue being disabled until we can guarantee it won't go down again.

We've stopped the bleeding but we actually don't know what caused it in the first place or why it required a full cluster restart to recover from it. But the search cluster is getting itself back into shape and should be useful soon.

A small status update:

the cluster seems to be recovering fine, but we're waiting to have multiple enwiki replicas online before we let searches reach the cluster again, in order not to be harmed again.

In the meanwhile, I've raised the replication bandwidth in order to speed things up but this takes time, alas.

Change 218338 had a related patch set uploaded (by Giuseppe Lavagetto):
poolcounter: re-enable mildly search

https://gerrit.wikimedia.org/r/218338

Change 218339 had a related patch set uploaded (by Giuseppe Lavagetto):
poolcounter: re-open search fully

https://gerrit.wikimedia.org/r/218339

Change 218338 merged by jenkins-bot:
poolcounter: re-enable mildly search

https://gerrit.wikimedia.org/r/218338

Change 218339 merged by jenkins-bot:
poolcounter: re-open search fully

https://gerrit.wikimedia.org/r/218339

Search should be fully back, I won't close the ticket until we have the cluster in a green status, but that won't happen for a while.

Also downgrading severity as the user-facing issues should be over.

Joe lowered the priority of this task from Unbreak Now! to Medium.
Deskana raised the priority of this task from Medium to Unbreak Now!.Jun 16 2015, 5:02 AM
Deskana subscribed.

The problem seems to be reoccurring, so I've raised this back to "Unbreak now!".

@Manybubbles has been roused from a good sleep to wrangle ES. Work is in progress here.

Change 218589 had a related patch set uploaded (by Filippo Giunchedi):
disable search temporarily

https://gerrit.wikimedia.org/r/218589

Change 218589 merged by jenkins-bot:
disable search temporarily

https://gerrit.wikimedia.org/r/218589

fgiunchedi lowered the priority of this task from Unbreak Now! to High.EditedJun 16 2015, 6:24 AM
fgiunchedi subscribed.

recurrence today, very likely the same root cause, cluster was fully restarted and green again

Any information on what the root cause of this is? I'd love to get the Search Team to spend some time trying to prevent a recurrence.

@Deskana there is a private ticket dealing with it, as it has some implications.

@Deskana I think you can ask another staff to add you to WMF-NDA as an employee. The root cause should be (I'm not 100% sure) in the task in "Blocked By" above.