Page MenuHomePhabricator

[M] Instrumentation for SDAW search improvements
Closed, ResolvedPublic

Description

Create instrumentation to measure the items listed in the parent ticket.

Questions and metrics

The high level questions and related metrics are in T307571

Next steps in data process (bolding indicates step we are currently on):

  • PM and Data Analyst coordinate to identify research questions, hypotheses, and guardrails for the new feature and associated metrics. This can be done in a measurement plan or on a phabricator ticket depending on the scale of the project.
  • Data analyst creates a list of events that need to be tracked to calculate each metric.
  • Engineers review the event list and comment on whether we’re already tracking with existing instruments and if new instrumentation is required. New tickets created for any new instrumentation needed. **
  • Data analyst works with engineers to figure out what the new instrumentation should look like and where to store the events.
  • Data analyst documents all of the above in the instrumentation spec
  • PM signs off on the plan

Documentation

SearchPreview Instrumentation (WIP)

Related Objects

Event Timeline

CBogen updated the task description. (Show Details)
CBogen updated the task description. (Show Details)

Product Analytics will review and prioritize this request during our next board review meeting on 10 May 2022.

mpopov triaged this task as Medium priority.
mpopov moved this task from Triage to Current Quarter on the Product-Analytics board.

@cchen , if not possible from existing instrumentation already, can we add instrumentation for results position of quickview? i.e. What is the average and range of results that users are using quickview for?
Currently, I think our mean clickthrough position is just over 2 (the distribution of results clicks has an average around the second result); I'm not sure what the range is. It would be good to see though that the range of quickviews is larger than that of clickthroughs, indicating users are using it to preview more results than they might be otherwise clicking through.

Is there anything new that we'll need to instrument? If so, any guidance would be appreciated - what kind of data would need capturing?

Is there anything new that we'll need to instrument? If so, any guidance would be appreciated - what kind of data would need capturing?

Yes, here's a list of metrics and events we want to track https://docs.google.com/spreadsheets/d/1drMnPt8mJa8rQRNCCpgkOo_0eQmdDlHDQnLIbjg8syU/edit#gid=0

Is there anything new that we'll need to instrument? If so, any guidance would be appreciated - what kind of data would need capturing?

Yes, here's a list of metrics and events we want to track https://docs.google.com/spreadsheets/d/1drMnPt8mJa8rQRNCCpgkOo_0eQmdDlHDQnLIbjg8syU/edit#gid=0

@cchen should I move this to our code review column for SD engineers to review? Are we now in the step "Engineers review the event list and comment on whether we’re already tracking with existing instruments and if new instrumentation is required. New tickets created for any new instrumentation needed" listed in the description of this ticket?

Is there anything new that we'll need to instrument? If so, any guidance would be appreciated - what kind of data would need capturing?

Yes, here's a list of metrics and events we want to track https://docs.google.com/spreadsheets/d/1drMnPt8mJa8rQRNCCpgkOo_0eQmdDlHDQnLIbjg8syU/edit#gid=0

@cchen should I move this to our code review column for SD engineers to review? Are we now in the step "Engineers review the event list and comment on whether we’re already tracking with existing instruments and if new instrumentation is required. New tickets created for any new instrumentation needed" listed in the description of this ticket?

Yes! I updated the status in the description.

CBogen renamed this task from Instrumentation for SDAW search improvements to [M] Instrumentation for SDAW search improvements.Oct 5 2022, 4:39 PM

@cchen Please find below our initial findings/assumptions and questions that we have regarding the proposed instrumentation document.

Schema
it appears as if we’d only need 1 schema, with all this data; I think we should be able to track all of this data for all these events, and it should provide everything analytics needs in order to answer the questions listed:

  1. action name (open-quickview|close-quickview|click-image|click-section|click-article-section|click-interwiki-commons|click-interwiki-links)
  2. timestamp (will come back to this)
  3. wiki id (mw.config.get( 'wgDBname' ))
  4. platform (will come back to this)
  5. bot (will come back to this)
  6. anon (mw.user.isAnon())
  7. session id (will come back to this)
  8. result position (not yet available, but we could expose the position via wgSpecialSearchTextMatches; will come back to this)

Assumptions - Our understanding of the requested data

  1. Number of quickviews open: total number of events with action = ‘open-quickview’
  2. Number of search sessions using quickviews: total of events with action = ‘open-quickview’ with unique session id
  3. Number of clicks on images in quickview: total of events with action = ‘click-image’
  4. …same for snippers, sections, article sections, interwiki-commons & interwiki links…
  5. Click through position of quickviews: I’m not yet sure what exactly “click through” means, but available actions + result position ought to suffice
  6. Quickview open timestamp; Quickview close timestamp & dwell time - available via action=open-quickview|close-quickview + timestamp

Questions

  1. click-image: the image is not currently clickable - In the current design this is not clickable. We either have to change the design or remove this from the implementation as it may not be possible. ( copying @Sneha

co confirm here)

  1. click-article-section: I don’t know what this one means exactly; does this refer to the “+ x more sections” links? Just to let you know, currently clicking the "+ more sections" just redirects you to the next available sections that were hidden from the list, it is not a different page/endpoint.
  2. timestamp: I don’t think we need to explicitly include the timestamp in the event payload; the common schema (which I assume we also include; e.g. this one) includes a “dt” field that I think is auto-populated based on the time the events are received. Is this assumption correct?
  3. platform: “mobile vs desktop”: is a bit blurry with mediawiki; I assume we mean mw.config.get('skin') === 'minerva' ? 'mobile' : 'desktop' here, rather than actual device. Also I just wanted to clarify that mobile is not going to be available on the first delivery and its implementation may differ from the Desktop (so we may have different actions depending from platform, but this is TBC)
  4. BOT: is there a simple way to do this currently on the client-side? Also not bots will not interact with QuickView (and bot accounts that do are user-operated when they do); should probably check why this needs to be tracked, if at all.
  5. Session ID: Are you correct to assume that this is assumed to be the SearchSatisfaction schema ID, so that they can cross-reference? If so, this is not currently available and we will need to do some work to expose it in the client-side.
  6. result position: one more where we need to inquire exactly the kind of data they want in order to know what to provide:
    • is this the on the page (so, with limit=20 and on page3, the 5th results would be 5), or
    • or the position in the dataset (so, with limit=20 and on page3, the 5th result would be 65)
  7. When referring to QuickView open/close, is this:
    • Open refers to the time the quickView first became visible and close to the time it is not visible anymore
    • Open is the time a specific quickView became visible (example when snippet "cat" is clicked) and closed is when that specific QuickView is gone (so this will not only apply if the Clickview is closed, but also if another result is clicked)
  8. Number of users: Is this all the users in which the extension is actually loaded in the client side (even if not used, but JS is loaded), or it is all the users in which the PHP extension was loaded, but it may have not been given to the client either because they have NO in the preference or because they are on mobile (as we currently do not serve it on mobile)

cc: @matthiasmullie

@cchen Please find below our initial findings/assumptions and questions that we have regarding the proposed instrumentation document.

Schema
it appears as if we’d only need 1 schema, with all this data; I think we should be able to track all of this data for all these events, and it should provide everything analytics needs in order to answer the questions listed:

  1. action name (open-quickview|close-quickview|click-image|click-section|click-article-section|click-interwiki-commons|click-interwiki-links)
  2. timestamp (will come back to this)
  3. wiki id (mw.config.get( 'wgDBname' ))
  4. platform (will come back to this)
  5. bot (will come back to this)
  6. anon (mw.user.isAnon())
  7. session id (will come back to this)
  8. result position (not yet available, but we could expose the position via wgSpecialSearchTextMatches; will come back to this)

Assumptions - Our understanding of the requested data

  1. Number of quickviews open: total number of events with action = ‘open-quickview’
  2. Number of search sessions using quickviews: total of events with action = ‘open-quickview’ with unique session id
  3. Number of clicks on images in quickview: total of events with action = ‘click-image’
  4. …same for snippers, sections, article sections, interwiki-commons & interwiki links…
  5. Click through position of quickviews: I’m not yet sure what exactly “click through” means, but available actions + result position ought to suffice
  6. Quickview open timestamp; Quickview close timestamp & dwell time - available via action=open-quickview|close-quickview + timestamp

Thanks for the summary of schema and assumption here, it covers everything. Just one thing I want to check is, do we have opt in/out options in preference for Quickview?

Questions

  1. click-image: the image is not currently clickable - In the current design this is not clickable. We either have to change the design or remove this from the implementation as it may not be possible.
  2. click-article-section: I don’t know what this one means exactly; does this refer to the “+ x more sections” links? Just to let you know, currently clicking the "+ more sections" just redirects you to the next available sections that were hidden from the list, it is not a different page/endpoint.

Yes, this means any clicks in the "Section in the articles" section in Quickview, also including the click in "+ x more sections".

  1. timestamp: I don’t think we need to explicitly include the timestamp in the event payload; the common schema (which I assume we also include; e.g. this one) includes a “dt” field that I think is auto-populated based on the time the events are received. Is this assumption correct?

Yes, it's a "dt" field when events are are received.

  1. platform: “mobile vs desktop”: is a bit blurry with mediawiki; I assume we mean mw.config.get('skin') === 'minerva' ? 'mobile' : 'desktop' here, rather than actual device. Also I just wanted to clarify that mobile is not going to be available on the first delivery and its implementation may differ from the Desktop (so we may have different actions depending from platform, but this is TBC)

Yes, for mobile the skin = "minerva". We can keep the platform as only "desktop" first. I will check if there are any differences in events on mobile vs. desktop.

  1. BOT: is there a simple way to do this currently on the client-side? Also not bots will not interact with QuickView (and bot accounts that do are user-operated when they do); should probably check why this needs to be tracked, if at all.
  2. Session ID: Are you correct to assume that this is assumed to be the SearchSatisfaction schema ID, so that they can cross-reference? If so, this is not currently available and we will need to do some work to expose it in the client-side.
  3. Number of users: Is this all the users in which the extension is actually loaded in the client side (even if not used, but JS is loaded), or it is all the users in which the PHP extension was loaded, but it may have not been given to the client either because they have NO in the preference or because they are on mobile (as we currently do not serve it on mobile)

The session id is the event.searchsessionid in SearchSatisfaction schema. if we are able to connect events in Quickview with search session ids, we should be able to get information of user agent, ips and isBot from SearchSatisfaction.

  1. result position: one more where we need to inquire exactly the kind of data they want in order to know what to provide:
    • is this the on the page (so, with limit=20 and on page3, the 5th results would be 5), or
    • or the position in the dataset (so, with limit=20 and on page3, the 5th result would be 65)

Result position is the position of the result on the page when clicking open Quickview of this result.

  1. When referring to QuickView open/close, is this:
    • Open refers to the time the quickView first became visible and close to the time it is not visible anymore
    • Open is the time a specific quickView became visible (example when snippet "cat" is clicked) and closed is when that specific QuickView is gone (so this will not only apply if the Clickview is closed, but also if another result is clicked)

cc: @matthiasmullie

Since we this is for calculating dwell time, looks like the second one makes more sense:

  • Open is the time a specific quickView became visible (example when snippet "cat" is clicked) and closed is when that specific QuickView is gone (so this will not only apply if the Clickview is closed, but also if another result is clicked)

Just one thing I want to check is, do we have opt in/out options in preference for Quickview?

Yes, the default is OPT IN, but there is a configuration to change this on an user by user preference.

Yes, this means any clicks in the "Section in the articles" section in Quickview, also including the click in "+ x more sections".

Due to your answer, it seem that both "Number of clicks on sections in quickview", "Number of clicks on article sections in quickview" are the same. Are you referring to something different when you mention "sections" in the first metrics?

I am not going to comment on the session Id reply as I do not have enough context, so I will leave that for @matthiasmullie

Yes the image in preview is currently not clickable. But it may be still good to know if people are clicking on it?

I am also hoping we are tracking how many people click on thumbnails in the search results as that is currently clickable.

Just one thing I want to check is, do we have opt in/out options in preference for Quickview?

Yes, the default is OPT IN, but there is a configuration to change this on an user by user preference.

And this preference has already been wired up to track: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/838120

I am not going to comment on the session Id reply as I do not have enough context, so I will leave that for @matthiasmullie

I gather that we will not need to track bot & anon as part of this schema, assuming we use the same session id as searchSatisfaction where this data should already be available. Right?
Note: searchSatisfaction doesn't currently expose that session id, so we'll need to go implement a way to fetch that same session id.

I have now created the ticket for this instrumentation work to take place, and as I was doing so I found the following outstanding questions:

  • https://phabricator.wikimedia.org/T321074 - It seems that both "Number of clicks on sections in quickview", "Number of clicks on article sections in quickview" are the same. Are you referring to something different when you mention "sections" in the first metrics? otherwise, I think they are actually referring to the same action and should not have different values.
  • Most of the actions are referring to "quickView" but we are calling this extension "SearchPreview", so not sure if we should change this now, to avoid name miss matching in the future

cc: @CBogen @cchen

  • https://phabricator.wikimedia.org/T321074 - It seems that both "Number of clicks on sections in quickview", "Number of clicks on article sections in quickview" are the same. Are you referring to something different when you mention "sections" in the first metrics? otherwise, I think they are actually referring to the same action and should not have different values.

@cchen is out of office this week, but I think we can move forward under the assumption that they are the same action and should not have different values. If I misunderstood, we can update when she's back.

  • Most of the actions are referring to "quickView" but we are calling this extension "SearchPreview", so not sure if we should change this now, to avoid name miss matching in the future

Let's be consistent and change it to SearchPreview everywhere.

@CBogen is this ticket now done? or are we going to link all the other tickets I have done to this as use it as an epic or sort?

Sorry just asking as it is in "in progress" with my name on it! :)

@CBogen is this ticket now done? or are we going to link all the other tickets I have done to this as use it as an epic or sort?

Sorry just asking as it is in "in progress" with my name on it! :)

I will unassign you and assign back to @cchen, and put it back in the analytics column. @cchen, should we consider this ticket complete now that we have the schema? Do you still have documentation to do as per "Data analyst documents all of the above in the instrumentation spec"? I believe @AUgolnikova-WMF and @MPhamWMF agreed in our analytics meetings that they've signed off on the plan.

It looks my question about clickthrough position was answered and is in the sheet. I'm signed off on the plan