Sometimes you have lots of servers and have to put them in different datacenters. Cirrus should be multi-DC aware, especially on writes.
A multi-write approach via the jobqueue (with the dest. cluster in the params) would probably work
Sometimes you have lots of servers and have to put them in different datacenters. Cirrus should be multi-DC aware, especially on writes.
A multi-write approach via the jobqueue (with the dest. cluster in the params) would probably work
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Deskana | T105703 Set up a CirrusSearch cluster in codfw (Dallas, Texas) | |||
Resolved | EBernhardson | T86781 Support multiple datacenters in CirrusSearch |
I _think_ the right way to do this is to wrap all of our write operations in jobs. Then we can attach a target cluster to those jobs. Then we can either call them in process with the jobs that build the parameters to the write operations _or_ we can pitch them into the job queue and let them happen. We can do the main DC in process and the secondary through the job queue. Or both on the queue.
This seems sane, the part I'm not clear on though is what update jobs are triggered through the secondary data center? Perhaps your just looking farther forward than WMF's current plan, but as i understand it the secondary datacenter will not be performing any write operations, only read. If one datacenter falls over the secondary will become the primary, but at that point again the writes are only happening in the primary datacenter and no jobs that trigger elasticsearch write operations should be triggered in the secondary datacenter.
After briefly talking to manybubbles yesterday it sounds like the idea here is:
A) There will be independent elasticsearch clusters in each datacenter.
B) Whichever datacenter does the writes will push jobs into the queues of remote datacenters to update their indexes
If we have more than one queue. If the queue is shared across both DCs then its fine - all that matters is that we:
My idea for that was to wrap all write operations in their own job and each job would have the target elasticsearch cluster as a parameter. We could just execute both jobs immediately in process and catch failures - if any fail we can queue them. Or we could queue the secondary DC's writes always. Something like that.
Talked briefly to Aaron, it sounds like decisions about multi dc are still in the ideation stage but its likely we will have a job queue per DC.
How chatty are the cirrus updates? It doesn't look too chatty and so should be ok for writes over the WAN, but i'm not entirely sure. I'm more concerned with the current implementation of the ElasticaConnection, For the sake of simplicity it might be clearer and easier if the configuration for a DC only knows how to talk to the ES cluster in its own DC.
I'm work on pausing writes first though, and then will come back to this.
Change 235149 had a related patch set uploaded (by Deskana):
refactor out connection singleton
Change 235175 had a related patch set uploaded (by Deskana):
Remove connection singleton
Change 237264 had a related patch set uploaded (by EBernhardson):
Enable communication with multiple datacenters
Change 237264 had a related patch set uploaded (by EBernhardson):
Enable communication with multiple datacenters
Deployed a patch to mediawiki-config, testwiki is now writing to both the standard eqiad cluster and the labsearch (single node) cluster. I'll turn on codfw tomorrow.
Change 255934 had a related patch set uploaded (by Reedy):
refactor out connection singleton