Page MenuHomePhabricator

Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active
Closed, ResolvedPublic

Description

More than 25,000 instances of an error message like this have occurred over the last hour.
https://logstash.wikimedia.org/app/dashboards#/view/mediawiki-errors
Note that they're logged under the jsonTruncated channel. I filed T297219 for that issue.

Event Timeline

I don't want to roll the train forward under these conditions so I added this task as a train blocker.

These errors are unrelated, it is an expected outage (user traffic is on another cluster). The errors come out of a snapshot restore process that's currently running re T296897 and T295705. Ideally these would be a bit quieter, but we didn't have a process already set for snapshot/restore and just kinda making it work.

The process itself is on the last index and should finish in the next hour or two.

EBernhardson claimed this task.

These are no longer being emitted, the cluster now has primaries available for the given indices.

These are no longer being emitted, the cluster now has primaries available for the given indices.

Fantastic. Thanks for looking into it.