Page MenuHomePhabricator

Metrics to evaluate new search for item suggestor
Open, Needs TriagePublic

Description

Story: As a PM/UX Designer we want to know if a new feature performs better/worse/different than an old one.

Context: We need the metrics to evaluate the search algorithm that suggests the items suggested in the item suggestor

TODO: For the old and the new item suggester implement the following:

  • Which algorithm the person used (newSearch vs. oldSearch)
  • Track the time between rendering the list on the users screen and the first click/selection on an item in the list ("Time-to-First-Click")
  • Track the position in the item list of the first click/selection. So if the first entry is clicked, the value is 1, if the2nd entry is clicked, it is 2.

The generated data should look like this:

AB-GroupEntry PointTime to first clickEntry ClickedNumber of Letters typed before selecting
[stringID][stringID][float as milliseconds][integer 1 until up to length of ist][integer]

please link the parent issue!

metrics are taken from a list of possible metrics provided via mail by the WMF's discovery team

Event Timeline

A few comments from a discussion with Jan:

  • Track the entry point: quick search or statement editor
  • the clock should start ticking as soon as the search box gets the focus
  • we may also want to track letters entered (prefix length) before selection
  • "reactivity" could be measured in terms of API response latency
  • How often are the "more" and the "contains" options used

The cope of this ticket is a bit unclear - does it ask for the tracking mechanism to be implemented, or for the data to be available and evaluated? I'm asking because implementing the tracking is a precondition to A/B testing, but A/B testing is a precondition to evaluating the data.

I'm asking because implementing the tracking is a precondition to A/B testing,
but A/B testing is a precondition to evaluating the data.

I suppose for me it was not evaluation of the data, but getting the right data needed for the right evaluation

I suppose for me it was not evaluation of the data, but getting the right data needed for the right evaluation

In that case, the dependency between the tasks is going the wrong way: We need to tracking code before we can do A/B-Testing. This means creating the tracking code blocks A/B testing, so A/B testing should be the parent task.

I thought that one can collect metrics even without AB-Testing, but one can't AB-Test without metrics. But as long as both collecting and AB-setup are considered, I don't mind which is the parent…

I thought that one can collect metrics even without AB-Testing, but one can't AB-Test without metrics.

Yes, exactly. That means metrics blocks A/B-testing, because A/B testing can't be done without metrics. Since the child blocks the parent, this means A/B must be the parent, and metrics the child.

I think the child/parent terminology is confusing. Better think in terms of blocks/needs, or before/after. Metrics have to be done before testing, so metrics come first, so metrics must be the "child" in Phabricator terms.

Not sure what's the plan for this one - are we still going to do it?