Page MenuHomePhabricator

Create/revive Search Platform team metrics dashboard
Open, HighPublic

Description

Create or revive a dashboard for the Search Platform team (formerly Discovery) that includes the below metrics, and the question we are trying to answer with them:

  1. Be able to filter by the following dimensions: platform (Desktop, Mobile Web, Android, iOS); bot vs non-bot, language
    • What do included metrics look like when we segment users/queries?
  2. Search Engagement (= number and percentage of queries with a dwell time > 10 seconds) [note to self: Search Engagement formerly known as Search Satisfaction/User Engagement].
    • Are users finding full text search results relevant/useful?
  3. Number and percentage of full-text searches, "Go" box, morelike, autocomplete searches
    • What type of searches are being used?
  4. Number of WDQS time outs
    • How often is Wikidata Query Service failing to return results for a user's query?
  5. Number and percentage of queries with "did you mean" suggestions.
    • How well are we accommodating imprecise search queries?
  6. Number and percentage of "did you mean" suggestions clicked on.
    • Are relevant are our results for imprecise search queries?
  7. Number and percentage of abandoned sessions.
    • How many users are unsatisfied with search results that they are leaving?
  8. Number of requests to WDQS and Linked Data Fragments, dumps, mediawiki APIs
    • What services are people using to get data from Wikidata?
  9. Number and percentage of zero results
    • What happened (recently) that may have drastically affected how many queries are returning zero results?
  10. Top queries and top keywords
    • Are there any common query patterns worth doing anything about?
  11. Top returned documents (articles) and top clicked through documents
    • Are there patterns in specific search results that are worth doing anything about?

We have excluded any metrics that would require building a new thing: i.e. a "smiley face" search satisfaction survey that would need to be built.

Meta-requests
Search Platform currently lacks analyst support in two major ways:

  1. Technical. Legacy bespoke dashboards were built in shiny and R and the team lacks the current resources and technical expertise to maintain this code ourselves.
  2. Data expertise. The team is able to do rudimentary data analysis, but lacks the expertise to really validate statistical significance, whether we are capturing the right signals to test our hypotheses, etc.

Even if a dashboard were to be built for us, it runs the risk of growing stale and obsolete without the ability to actively maintain/tune it, which we are unable to do on our own. To ensure long-term value and avoid wasted effort in constantly (re)building dashboards that grow stale, it would be ideal to have access to resources to help us maintain our metrics in the long term.


Include relevant timelines/deadlines, OKRs
we would like to have this in time for the next quarterly planning cycle

How will you use this data product?
To understand what Search performance/features/functionality looks like in production, and discover and prioritize future Search Platform team work.

Is this request urgent or time sensitive?
no

Details

Other Assignee
cchen

Event Timeline

Hi @MPhamWMF! My team will triage and prioritize this on Tuesday, April 6th. To help with that, can you please provide some additional details to the prompts I added in the description? Thanks!

Thanks @mpopov! I filled out the additional requested details. Please let me know if you need more description!

kzimmerman added a subscriber: kzimmerman.

Most of this will need to wait until we can hire an analyst to support Search, but we'll look into what we can provide in the near term

MPhamWMF updated the task description. (Show Details)

oops, didn't mean to remove you @kzimmerman -- i had an edit sitting around that preceded your triaging of the ticket.
Thanks for the update regarding the timing and priority of this.

@MPhamWMF - @JKatzWMF and I spoke about your request, and Carol flagged that having basic search metrics is a high priority.

Can you pare this down to your highest priority needs (or questions you're currently addressing), so we can work on those in the near term?

I still think it's critical that we get budget next year for analytics support for Search.

Thanks, Kate. I'll reorder the requests in this ticket by (descending) order of priority by end of day Mon April 19.

@kzimmerman, items are now ordered in descending order of priority for us, so you should be able to draw the line wherever you need to based on what you're able to handle.
I know some of these items will also probably need more discussion as we dig in, so just keep me posted when that is necessary. I also know some of these items may already exist somewhere, so best case scenario is some of them might be freebies

kzimmerman added a subscriber: nettrom_WMF.

Assigning to @nettrom_WMF; @mpopov will support

@MPhamWMF : I've been working on putting together a notebook to aggregate user engagement metrics (full text query sessions with a dwell time of > 10 seconds), so we have a working example of using data from SearchSatisfaction. From digging into data from that schema, it appears to only be instrumented on desktop. Could you check in with the engineers on the team about whether that's the case? I want to make sure I'm not looking for non-existent data.

I'll be picking this up again next week and aim to have a prototype dashboard based on that metric as well as one metric using CirrusSearch as the dataset (e.g. number of searches) ready as soon as possible.

@MPhamWMF : I've been working on putting together a notebook to aggregate user engagement metrics (full text query sessions with a dwell time of > 10 seconds), so we have a working example of using data from SearchSatisfaction. From digging into data from that schema, it appears to only be instrumented on desktop. Could you check in with the engineers on the team about whether that's the case? I want to make sure I'm not looking for non-existent data.

This is the expected case, search is only instrumented on the desktop skin.

I've added @cchen as the second assignee to this task. She'll be picking up this task as we transition my work from Search and Structured Data. So far that means she'll start working on item number 3.

I've completed data gathering for item number 2 and have drafted a dashboard in Superset. I'll be back tomorrow with a link, as the dashboard needs a little more documentation on it to make it clear how it works and why it works in particular ways.

@MPhamWMF : We now have a test dashboard for user satisfaction (item number 2), it's available here: https://superset.wikimedia.org/r/628

The dashboard is based on data from June 1 to July 12. It's limited to desktop sessions per the previous comments on this task.

I've got it set up to show number of sessions, sessions w/a click on a result, and sessions w/a checkin on a daily basis. That chart can be filtered by platform, wiki, language code of the wiki (e.g. all English-language wikis), agent type (user, automated, bot), and time range.

There's also a chart showing the proportion of sessions with a dwell time of > 10s. In this test setup we can only calculate proportions on a per-wiki basis, so that chart works best when looking at one or a handful of specific wikis. This chart can also be limited by time range.

Lastly, I added a chart showing the decay in counts of sessions based on a checkin time larger or equal to a specific value. This is also limited to filtering by specific wikis.

I wanted to get this to you so you can start exploring it, seeing if it helps answer questions and spur ideas. The limitation of filtering is something that can be fixed by importing the data into Druid. For example the Session length dashboard has proportions and percentiles and bucketing because of that.

@nettrom_WMF , thanks! I'm currently getting a Presto error that says access denied and that I should contact chart owners for assistance. Have tried reloading a few times but didn't seem to help. Any advice?

@nettrom_WMF , thanks! I'm currently getting a Presto error that says access denied and that I should contact chart owners for assistance. Have tried reloading a few times but didn't seem to help. Any advice?

That's probably because I hadn't updated the permissions to the underlying data, sorry! Could you try again?

If it still doesn't work, could you let me know what the specific error message you get is? You should have access to the data based on the permissions discussed in the comments in T270438, and hopefully it was just me missing the permissions that was needed.

Thanks, the charts load now! I'll take a look and get back to you in the next week with feedback. thanks!