Page MenuHomePhabricator

Update Elastica library to 5.0.0 and get CirrusSearch working with it
Closed, ResolvedPublic

Description

As part of the 5.x upgrade we should update our client library to the latest release, currently 5.0.0, as well. Work through the various problems and get cindy passing the browser test suite with this new library version.

Event Timeline

Change 333128 had a related patch set uploaded (by EBernhardson):
Mapping updates for ES 5.x

https://gerrit.wikimedia.org/r/333128

Change 333129 had a related patch set uploaded (by EBernhardson):
Update search mappings for elasticsearch 5.x

https://gerrit.wikimedia.org/r/333129

Took a bit of re-configuring my local vagrant (new instance for es5) to be happy, but it seems all except one test (that's not tagged @expect_failure) is failing now, it's a relevancy test which was pretty flaky last time we upgraded es versions as well. To have equal comparison i dumped cirrustestwiki_content from cirrus-browser-bot and imported it locally (otherwise docFreq's and such differ). Basically compare:

es2: http://cirrustest-cirrus-browser-bot.wmflabs.org/w/api.php?action=query&format=json&list=search&srsearch=Relevancyclosetest+Foo&srqiprofile=classic_noboostlinks&cirrusDumpResult&cirrusExplain=pretty
es5: http://cirrustest.wiki.local.wmftest.net:8080/w/api.php?action=query&format=json&list=search&srsearch=Relevancyclosetest+Foo&srqiprofile=classic_noboostlinks&cirrusDumpResult&cirrusExplain=pretty

From es2 the pages are ordered: 'Relevancytest foo' (691.8987), 'Relevancyclosetest Foô' (254.56299), 'Foo Relevancyclosetest' (233.97305)
For es5 the last two flip: 'Relevancytest foo' (853.5483), 'Foo Relevancyclosetest' (509.2177), 'Relevancyclosetest Foô' (502.59995)

Note that all of these have a *10 applied for language boost, so the difference between the last two is fairly small in both cases. The large change in scores i'm not too sure about, but likely its because the 0.5 coord factor dissapeared, along with a difference in the content of the two dbs.

Looking over the two explains, the scores break down as roughly:

'Relevancyclosetest Foô'
es2 : 0.5 (coord) (24.794167 (title) + 4.7421784 (suggest) + 4.4098306 (text)) + 8.483211 (phrase) = 25.45 * 10(lang) = 254.5
es5: 25.698328 (title) + 10.148193 (suggest) + 4.9270577 (text) + 9.486414 (phrase) = 50.25 * 10 (lang) = 502.5

On es2 suggest had the coord factor applied twice, the original suggest score was 9.484357 and it was cut in half to 4.74, then after summing the parts it was cut in half again. Getting the exact same scoring will be difficult, because the coordination factor isn't a static value we can just multiply our weights by. We should probably consider that:

  • Suggest weight should probably be lowered, to account for it not having the coordinating factor applied anymore
  • Phrase rescore may need a higher weight, as before the other portion of the query was having a coordinating factor applied and it no longer is.

The exact values though I'm not certain, will probably need some relforge testing (which means getting es 5.x on the relforge cluster though)

Change 333128 merged by DCausse:
Mapping updates for ES 5.x

https://gerrit.wikimedia.org/r/333128

Change 333988 had a related patch set uploaded (by EBernhardson):
Update browsertest search profile for es5

https://gerrit.wikimedia.org/r/333988

Change 333988 merged by DCausse:
Update browsertest search profile for es5

https://gerrit.wikimedia.org/r/333988

one last patch and we can move this to done: https://gerrit.wikimedia.org/r/#/c/333129/
This is now happy since the other patches to es5 branches have merged

Change 333129 merged by jenkins-bot:
Update search mappings for elasticsearch 5.x

https://gerrit.wikimedia.org/r/333129