Page MenuHomePhabricator

[epic] Setup elasticsearch cluster in labs and build out necessary tooling to use it for hypothesis testing against production data
Closed, ResolvedPublic

Description

Measuring what the users want, and whether they like the results they got is very hard, of course, but we could still measure the magnitude of the effect of a change before we ever deploy it.

For example, we could take a random sub-sample of a day's query traffic (1K, 10K, 100K , or 1M samples, depending), and submit those queries to two (or a hundred!) variations of the relevant indexes, and then measure the change in several metrics:

  • # queries with zero results
  • # queries with changes in order in the top-N (5?, 10?, 20?) results
  • # queries with new results in the top-N results
  • # queries with changes in total results (very pretty 2-D graphs await!)
  • etc.

This will let us very quickly test whether a change even does anything. Obviously, for example, a change that has no effect on the top 20 results for any of 100K queries isn't going to be a game changer.

To do this, we need:

  • a cluster in labs to send the data to and do the analysis on.
  • automation to clear out old test indices and/or bring in new indices from prod
  • sets of queries to test against a few different wikis
  • figure out necessary machinery to export config from prod and import it while running search tests.

Details

Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : masterallow specifying index baseName to runSearch.php

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a project: CirrusSearch.
EBernhardson added a subscriber: EBernhardson.
Restricted Application added a project: Discovery. · View Herald TranscriptAug 11 2015, 11:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
EBernhardson renamed this task from Setup elasticsearch cluster in labs for running tests against to [epic] Setup elasticsearch cluster in labs and build out necessary tooling to use it for hypothesis testing against production data.Aug 11 2015, 11:28 PM
EBernhardson set Security to None.
TJones added a subscriber: TJones.Aug 12 2015, 3:32 PM
EBernhardson updated the task description. (Show Details)Aug 13 2015, 3:42 AM
EBernhardson added a subscriber: Deskana.

Change 231615 had a related patch set uploaded (by EBernhardson):
allow specifying index baseName to runSearch.php

https://gerrit.wikimedia.org/r/231615

Change 231615 merged by jenkins-bot:
allow specifying index baseName to runSearch.php

https://gerrit.wikimedia.org/r/231615

Deskana triaged this task as Medium priority.Aug 31 2015, 3:22 AM
Deskana closed this task as Resolved.Aug 31 2015, 3:25 AM
Deskana claimed this task.

https://suggesty.wmflabs.org has the enwiki production index in it without any of the content, and we've used it to test hypotheses (see T109729), so I'm calling this task complete.