Page MenuHomePhabricator

CirrusSearch: Support pausing writes to Elasticsearch
Closed, ResolvedPublic

Description

Cirrus should support pausing updates - both updates for an individual index and for all indexes. There are two use cases:

  1. Create a utility that just pauses all writes so we can integrate that into the rolling restart process. Right now it won't be used but it'll be super useful with https://github.com/elastic/elasticsearch/issues/10032 .
  2. Pause writes to an index while its being reindexed with --reindexOk during mapping update. This should prevent any data from being lost.

I _think_ the right way to go about this is to use part of the solution I proposed for T86781 - namely wrapping the write operations in jobs. If writes are paused then always queue those jobs. Maybe just requeue them if they are popped. I'm not sure about that.

Stakeholder: Cirrus operators (the Cirrus engineers are the operators for Cirrus but that might be a useful distinction one day)
Benefit: See use cases above
Estimate: 2 weeks

Details

Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : masterSupport pausing writes to elasticsearch

Event Timeline

Manybubbles raised the priority of this task from to Normal.
Manybubbles updated the task description. (Show Details)
Manybubbles added a project: Discovery.
Manybubbles moved this task to On Sprint Board on the Discovery board.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 15 2015, 2:40 PM
Manybubbles set Security to None.

I'm dropping this in the sprint because I believe it should be worked on sooner rather than later. Its purely technical but still quite useful.

I'm thinking we will have to requeue them when popped, otherwise iiuc the index will have to be rebuilt to get up to speed. Perhaps an ever increasing job delay timeout that has a reasonable maximum (1hr?). To prevent a queue that just builds up I'm fairly certain we nede to set a maximum time limit, after which the job will just be dropped. Based on our use case, is a day long enough? This will be configurable anyways but we should choose a reasonable maximum.

I'm thinking we will have to requeue them when popped, otherwise iiuc the index will have to be rebuilt to get up to speed. Perhaps an ever increasing job delay timeout that has a reasonable maximum (1hr?). To prevent a queue that just builds up I'm fairly certain we nede to set a maximum time limit, after which the job will just be dropped. Based on our use case, is a day long enough? This will be configurable anyways but we should choose a reasonable maximum.

Requeueing is pretty annoying. I'd tried to put together a patch that just never popped them a while ago but Aaron didn't like it because it adding a database lookup to get the do not pop list too frequently. But if its the only thing we can do it'll have to do.

One thing we have to be careful of is that the jobs be fast or that deduplication works. My proposal around wrapping write operations really just makes them fast and relies on Elasticsearch to deduplicate - it'd require some version checking that we don't already have though. We would never want to write an older version then we have.

If we go with a delay then we probably want some kind of exponential backoff so short readonly events don't end up with jobs queued for an hour. As far as the maximum delay - a day seems pretty sane. OTOH we don't measure the delay for transitive (from templates) updates so it might be that a day isn't uncommon for those.

Noting that sealing indexes has been merged to elasticsearch master branch now: https://github.com/elastic/elasticsearch/pull/11179

Still a month-ish from the next release I believe.

Manybubbles renamed this task from CirrusSearch: Support pausing writes to elasticsearch to CirrusSearch: Support pausing writes to Elasticsearch.Jun 3 2015, 2:52 PM
Manybubbles updated the task description. (Show Details)

One thing i'm not sure about here yet, is how to orchestrate the cluster-wide pausing of writes. The simplest way would be to have a global variable set from the mediawiki-config repository, and simply pause writes for the entire cluster while doing a rolling restart. As long as restarts are kept to an hour this is probably acceptable.

Otherwise we would need some form of cluster wide coordination. Currently apache zookeeper is deployed to the wmf cluster and can handle this task, but communicating with it from php will take some work setting things up. There does not seem to be any php library for this yet, but there is a php extension that utilizes libzookeeper. It is plausible (although untested) that this php extension could be compiled for hhvm with the zend compat layer tstarling wrote. I can certainly try this out, but i'm uncertain how long this would take in testing to ensure it is stable and production ready.

Another option for cluster wide coordination is etcd. This is not currently deployed, but operations is in the process of setting it up. Based on the existing phabricator tickets it looks like they intend to use etcd for cluster wide configuration files and such. Etcd will also be much easier to integrate with from php. It is a much simpler daemon than zookeeper and offers a very simple REST interface.

Overall, I think we should initially write this using a global variable in mediawiki-config to handle pausing writes to all elasticsearch servers. This will slightly adjust the task, it means that we cannot have a utility that pauses writes we will have to merge patches in gerrit and deploy them. Similarly we will not be able to pause writes while doing reindexing unless we again merge patches in gerrit and deploy them. Once etcd is deployed to production and available we can work up these last few parts.

Opinions?

One thing i'm not sure about here yet, is how to orchestrate the cluster-wide pausing of writes. The simplest way would be to have a global variable set from the mediawiki-config repository, and simply pause writes for the entire cluster while doing a rolling restart. As long as restarts are kept to an hour this is probably acceptable.

We already have a distributed, replicated data store - we could just pitch the "pause all the writes" flag into Elasticsearch.

Overall, I think we should initially write this using a global variable in mediawiki-config to handle pausing writes to all elasticsearch servers. This will slightly adjust the task, it means that we cannot have a utility that pauses writes we will have to merge patches in gerrit and deploy them. Similarly we will not be able to pause writes while doing reindexing unless we again merge patches in gerrit and deploy them. Once etcd is deployed to production and available we can work up these last few parts.

Change 218993 had a related patch set uploaded (by EBernhardson):
[WIP] Support pausing writes to elasticsearch

https://gerrit.wikimedia.org/r/218993

The most basic part of this, shifting the code that does writes into Jobs is completed. We first try and run the Job in process, if the indexes are frozen the job will insert itself into the queue. Any future attempts to run the job will do the same thing.

freezing/unfreezing the indexes is also written now. A single cluster-wide index is created to track which indexes are currently frozen. We can freeze either individual indexes or we can freeze writes to the entire cluster.

For version conflicts, i'm thinking of adding the revisionId to the document and extending super_detect_noop to have the ability to compare a version number(aka revisionId) in the document to the version number that is currently indexed and noop'ing anything that is from a prior revision. seem sane enough? super_detect_noop doesn't yet have document level noop'ing, but it seems to fit with the principle. Should be a simple enough way to get my feet wet in the java end of this stack. Does this make sense?

Change 219118 had a related patch set uploaded (by EBernhardson):
[WIP] Document level noop detection with a version field

https://gerrit.wikimedia.org/r/219118

Change 219118 abandoned by EBernhardson:
[WIP] Document level noop detection with a version field

Reason:
unnecessary as pointed out by nik

https://gerrit.wikimedia.org/r/219118

This is mostly written now, and seems to work in my isolated tests. The only major blocker left is out-of-order updates to 'OtherIndex' (local_sites_with_dupe). I've been running this through my head over the weekend, but i can't think of anything that will work as we need.

This is now ready for review/merge and has a couple basic tests to go with it. The tests are slow, but that is because they have to wait for intentionally delayed jobs to work their way through the job queue. I switched the minimum wait time from 64s to 32s to speed them up a bit, but it's still a waiting game.

There are no tests for the Reindexer, but i've manually checked that it is doing what we want.

Change 218993 merged by jenkins-bot:
Support pausing writes to elasticsearch

https://gerrit.wikimedia.org/r/218993

now merged, i'll be testing this functionality on beta cluster with a rolling restart to install the statsd plugin

Change 219118 restored by EBernhardson:
[WIP] Document level noop detection with a version field

Reason:
It turns out elasticsearch doesn't allow versioning with the update api, which is required to use the super_detect_noop script. As such we need to bring this back

https://gerrit.wikimedia.org/r/219118

debt reopened this task as Open.Sep 2 2016, 2:10 PM

closing this in favor of re-attaching the patch to the new task, T144039

EBernhardson closed this task as Resolved.Sep 6 2016, 10:26 PM