Page MenuHomePhabricator

Reindex all wikis for the job queue outage on mar 10th
Closed, ResolvedPublic

Description

There is an ongoing problem with job queue right now, causing some elasticsearch jobs to get dropped on the floor. Once the job queue is settled we need to reindex all wiki's over the time period to make sure we don't miss any updates. The failure started around March 9th 22:00 UTC.

Graph of failures: https://grafana.wikimedia.org/dashboard/db/elasticsearch?panelId=24&fullscreen

As of writing this ticket the job queue issue is still being triaged, so we cannot do this yet.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Marking high priority, although we cannot do this just yet.

The problem is not resolved yet :(
Investigation ongoing in T129517

The outage looks over as of midnight 3/12 UTC., the root problem doesn't look to have been solved though. We might be able to start the reindex process now though, but not 100% sure.

I've started the reindexing from terbium. I used the time period 2016-03-09T20:00:00Z to 2016-03-12T08:00:00Z