Page MenuHomePhabricator

Decide order of operations for elastic 6 upgrade
Closed, ResolvedPublic

Description

Our deployment plan for elastic 6 was:

  • Deploy split clusters with multi-cluster search
  • Deploy archive indices
  • Deploy elastic 6

It turns out that our current version of elastic has a bug that blocks deployment of multi-cluster search. The fix is shipped in elastic 5.6.3. We have a few options:

Deploy elastic 5.6.x before everything else

Deploying elastic 5.6.x first will likely add 2 to 4 weeks to the deployment plan. This comes with the benefit of additional deprecation notices relevant to the elastic 6 upgrade

Built a custom elastic 5.5.2 with the bugfix

The bug fix is a relatively small/isolated patch. We could temporarily run a custom 5.5.2 with this bugfix. Seems iffy, we've never built custom elastic versions before.

Re-order deployment

We can swap around the deployment order:

  • Merge the archive index patch, don't create new archive indices yet
  • Perform elastic 6 upgrade
  • Turn on cross-cluster search
  • Create archive indices

This avoids the cross cluster search bug by keeping queries on the main cluster until we've upgraded to a fixed version of elastic.

Hack around the bug

  • Force cross-cluster searches to query multiple indices to avoid the bug

This lets us continue as is, with minor inconvenience.

Event Timeline

EBernhardson added a subscriber: TJones.

Looking to solicit opinions on the best way forward for our cross-cluster search bug. Feel free to add additional options to the description.

I agree that a custom version of Elastic 5.5.2 seems iffy, so unless someone has a strong argument for it, we should skip that option. The proposed "re-order deployment" plan seems reasonable.

Reordering deployment is only possible if we get rid of the archive type by creating the indices.

Another iffy solution would be to workaround the bug by always queying 2 shards (e.g. force IW searches to run on the frwikiquote content+general alias). If I understood correctly the problem it should prevent the bug.
Deploying 5.6.13 is certainly the cleanest option but also the one that requires most work (port & release all the plugins, rolling restarts)

I don't think we need to actually create the archive indices ahead of time, I will double check with a test but I'm fairly certain I've read in the docs that elastic 6 can open and use a multi-type index from elastic 5, it simply can't create new indices with multiple types. By merging the archive code our create operations will be inline with elastic 6, and it will continue reading old indices as necessary.

Another iffy solution would be to workaround the bug by always queying 2 shards (e.g. force IW searches to run on the frwikiquote content+general alias). If I understood correctly the problem it should prevent the bug.

With a quick test on mwdebug1002, forcing SearchRequestBuilder::getPageType() to send false as the $indexType causes it to always search multiple indices and cross-cluster works. It's a bit of a stupid hack, but i'm ok with forcing $indexType here when doing a cross cluster search, and remembering to remove that hack in the es6 branch. That would mostly be a two line hack that will avoid this entire ticket.

Change 484816 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Hack around cross cluster search bug

https://gerrit.wikimedia.org/r/484816

I don't think we need to actually create the archive indices ahead of time, I will double check with a test but I'm fairly certain I've read in the docs that elastic 6 can open and use a multi-type index from elastic 5, it simply can't create new indices with multiple types. By merging the archive code our create operations will be inline with elastic 6, and it will continue reading old indices as necessary.

you are correct (not sure why I had in mind that this type thing must absolutely be ready before moving to 6). Because we don't recreate indices too often we should not pollute the main cluster with too many shards.

Thanks for the hack, I'll try to ship this today and see how it goes.

Change 484816 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Hack around cross cluster search bug

https://gerrit.wikimedia.org/r/484816

Change 484857 had a related patch set uploaded (by DCausse; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.33.0-wmf.12] Hack around cross cluster search bug

https://gerrit.wikimedia.org/r/484857

Change 484858 had a related patch set uploaded (by DCausse; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.33.0-wmf.13] Hack around cross cluster search bug

https://gerrit.wikimedia.org/r/484858

Change 484857 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.33.0-wmf.12] Hack around cross cluster search bug

https://gerrit.wikimedia.org/r/484857

Change 484858 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.33.0-wmf.13] Hack around cross cluster search bug

https://gerrit.wikimedia.org/r/484858

dcausse triaged this task as Medium priority.

Thanks Erik, it worked!