Page MenuHomePhabricator

Run tests to measure the expected zero results change of adjusting phrase slop
Closed, ResolvedPublic

Description

Before running our tests it would be great to have a prediction of what the change will do.

Erik has some ansible scripts that export indexes from prod and import them to a labs instance, and we recently added runSearch.php which allows running searches in bulk from the CLI by piping in a file full of queries. Should be that almost everything is ready to go, just have to throw a bunch of queries at it.

If lucky, the following command should run suggestions from a file on your local machine:

echo foo | ssh elasticsearch-tests.eqiad.wmflabs cd /srv/mediawiki-vagrant/ '&&' mwvagrant ssh -- mwscript extensions/CirrusSearch/maintenance/runSearch.php --type=suggest

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)

This will be a little easier with https://gerrit.wikimedia.org/r/#/c/231615/ which will allow specifying the basename of the index, rather than having to manually alias the imported index to wiki_*

TJones set Security to None.

In order to pass params like --options '{"wgCirrusSearchPhraseSlop":{"precise":1, "default":0, "boost":0}}', you have to recursively escape. Also, sometimes the JSON gets split up into parts, so you have to escape all the curly braces, too, and then escape the escapes.

Here's a search with the default slop params (should return 0 rows from ptwiki):

echo \"canta Freewheelin Bob Dylan\" | ssh elasticsearch-tests.eqiad.wmflabs cd /srv/mediawiki-vagrant/ '&&' mwvagrant ssh --  mwscript extensions/CirrusSearch/maintenance/runSearch.php --baseName ptwiki --options \$$'\047'\\047$'\047'\$$'\047'\\173$'\047'\$$'\047'\\042$'\047'wgCirrusSearchPhraseSlop\$$'\047'\\042$'\047':\$$'\047'\\173$'\047'\$$'\047'\\042$'\047'precise\$$'\047'\\042$'\047':0, \$$'\047'\\042$'\047'default\$$'\047'\\042$'\047':0, \$$'\047'\\042$'\047'boost\$$'\047'\\042$'\047':1\$$'\047'\\175$'\047'\$$'\047'\\175$'\047'\$$'\047'\\047$'\047'

And if you want to increase the slop to 1 for quoted strings (note the :0 a little ways after precise—we'll change that to 1) and this should return 1 row from ptwiki:

echo \"canta Freewheelin Bob Dylan\" | ssh elasticsearch-tests.eqiad.wmflabs cd /srv/mediawiki-vagrant/ '&&' mwvagrant ssh --  mwscript extensions/CirrusSearch/maintenance/runSearch.php --baseName ptwiki --options \$$'\047'\\047$'\047'\$$'\047'\\173$'\047'\$$'\047'\\042$'\047'wgCirrusSearchPhraseSlop\$$'\047'\\042$'\047':\$$'\047'\\173$'\047'\$$'\047'\\042$'\047'precise\$$'\047'\\042$'\047':1, \$$'\047'\\042$'\047'default\$$'\047'\\042$'\047':0, \$$'\047'\\042$'\047'boost\$$'\047'\\042$'\047':1\$$'\047'\\175$'\047'\$$'\047'\\175$'\047'\$$'\047'\\047$'\047'

And to get more rows, up precise to 10000 (3 rows):

echo \"canta Freewheelin Bob Dylan\" | ssh elasticsearch-tests.eqiad.wmflabs cd /srv/mediawiki-vagrant/ '&&' mwvagrant ssh --  mwscript extensions/CirrusSearch/maintenance/runSearch.php --baseName ptwiki --options \$$'\047'\\047$'\047'\$$'\047'\\173$'\047'\$$'\047'\\042$'\047'wgCirrusSearchPhraseSlop\$$'\047'\\042$'\047':\$$'\047'\\173$'\047'\$$'\047'\\042$'\047'precise\$$'\047'\\042$'\047':10000, \$$'\047'\\042$'\047'default\$$'\047'\\042$'\047':0, \$$'\047'\\042$'\047'boost\$$'\047'\\042$'\047':1\$$'\047'\\175$'\047'\$$'\047'\\175$'\047'\$$'\047'\\047$'\047'

Ok, here is a better way to run the tests. I've shut down elasticsearch-tests along with some unused servers and booted two of the current XL instances into a cluster. These instances are estest1001 and estest1002. They use the production puppet roles and only have ES, no mediawiki. The cirrus-browser-bot instance is pointed at this cluster and can be used for all actions.

That should increase memory and disk availability. But its still some funky escaping to connect. instead lets set up some port forwarding and do all the php processing locally:

Set up a port forward from your local machine port 9201 to 9200 on estest1001. This wont login and will just sit in the background.

ssh estest1001.eqiad.wmflabs -L 9201:localhost:9200 -N &

On your local machine shutdown elasticsearch inside vagrant:

vagrant ssh sudo service elasticsearch stop

Then port forward 9201 into 9200 inside vagrant along with opening a shell:

vagrant ssh -- -R 9200:localhost:9201

Once inside the vagrant session this will run three queries against the elasticsearch-tests machine:

echo '
foo
bar
baz ' | mwscript extensions/CirrusSearch/maintenance/runSearch.php --baseName=ptwiki

For reference, I had to run this

$ vagrant ssh -- sudo service elasticsearch stop

And single quotes still have to be escaped ridiculously (though much less than before!) ' --> '"'"':

$ echo '"Freewheelin'"'"' Bob Dylan"' | mwscript extensions/CirrusSearch/maintenance/runSearch.php --baseName=ptwiki
{"query":"\"Freewheelin' Bob Dylan\"","rows":10,"hits":76}

i.e., close the open single quote, start a new double quote, put the single quote you want in there, close the double, and open a new single to continue what you were doing before... perfectly logical!

But most of all, it is blazing fast in comparison. Thanks!!!

you can avoid the echo just by catting a file in:

mwscript extensions/CirrusSearch/maintenanace/runSearch.php --baseName=ptwiki < /path/to/list/of/searches

glad this is working better. Probably not even worth it to try the 4x larger enwiki_content though

Dang it... I tried that at early on and it didn't seem to work... but it does now, so some other wrinkle must've been ironed out. Thanks!

Probably not even worth it to try the 4x larger enwiki_content though

Aww... but it would be really, really swell.