Page MenuHomePhabricator

Evaluate DYM metrics available in current search satisfaction logging
Closed, ResolvedPublic

Description

The search satisfaction schema records various information about DYM usage, but we've never used it for anything so don't have any metrics. Work with a sample of the data and see if we have useful metrics here.

The result of this ticket will be a jupyter notebook commited to the relforge repository to calculate and display metrics, and potentially minor patches to data collection as necessary.

Potential metrics. % of X refers to either per-search or per-session:

  • % of X shown a dym suggestion
  • % of X shown the search results of a dym suggestion
  • ^ but excluding 'autorewrite'?
  • % of X shown a dym suggestion that clicked through to dym results
  • % of X shown dym results that clicked a result

Event Timeline

@dcausse @TJones Any other suggestions for metrics? For reference what I've put together so far is a transformation of the search satisfaction events into a simplified "dym search event" table. This table has one row per search performed and has a boolean indicating each of the following conditions. We can probably track more but this might be sufficient.

  • Is this query a suggested query?
  • Is DYM shown on top of SERP?
  • Was DYM clicked on top of SERP?
  • Were any results clicked?

Looks good.

% of X shown dym results that interacted with the result list

Not 100% sure I get this—do you mean the user clicked on a DYM result?

^ but excluding 'autorewrite'?

It would definitely be nice to do one of three things: ignore autorewrite status, only include autorewrites, or exclude all autorewrites. (I'm curious to see how autorewrites compare to simple suggestions in terms of frequency, click through rates, etc.)

Looks good.

% of X shown dym results that interacted with the result list

Not 100% sure I get this—do you mean the user clicked on a DYM result

I mean that the user clicked on any result on the page, essentially a "success"

^ but excluding 'autorewrite'?

It would definitely be nice to do one of three things: ignore autorewrite status, only include autorewrites, or exclude all autorewrites. (I'm curious to see how autorewrites compare to simple suggestions in terms of frequency, click through rates, etc.)

ok i'll slice the metrics on that dimension as well

Change 524814 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[wikimedia/discovery/relevanceForge@master] Calculate DYM metrics for full text search

https://gerrit.wikimedia.org/r/524814

Change 524814 merged by jenkins-bot:
[wikimedia/discovery/relevanceForge@master] Calculate DYM metrics for full text search

https://gerrit.wikimedia.org/r/524814

Followup will be in T216058 to test import the backing data into druid and evaluate if one of the druid interfaces can visualize our metrics.

Metrics we should use moving forward

  • % of search shown a [auto / non-auto] dym
    • Target: Increase % without significantly reducing the other metrics
  • % of people shown non-auto dym that click through to dym results
    • Target: Increase % of clickthrough
  • % of searches shown dym search results [auto / non-auto] dym results that clicked a result
    • Target: Increase % of clickthrough