Page MenuHomePhabricator

Image Recommendations: Instrumentation Analysis
Open, MediumPublic

Description

This task will start with @SNowick_WMF and possibly accrue subtasks. @JTannerWMF may add content to this task.

The insight we seek to gain through instrumentation will include:

  • How does the lack of understanding English influences the ability for someone to complete the task
  • We need to explicitly track, which source the image is coming from (Wikidata, Commons, Other Wikis), so that we understand the influence that has on accuracy.
  • We want to understand the success rate of the tutorial
  • We want to know if people like the task. A method to do so is to evaluate if users return to complete it on three distinct dates
  • We want to compare frequency of return to the task (retention) by date across user tenure and language to understand if there was more stickiness for this task by how experienced a user is and if they speak English or not
  • Ensure we can see if someone backgrounds the app
  • Of the people that got the task right, how long did it take them to submit an answer? We want to see this data to categorize if a match is easy or hard
  • We want to see if someone clicked to see more information on a task, this will help us determine the difficulty of a task
  • We want to know how often someone selects certain choices in the pop up dialog in response to No or Not Sure
  • We want to see the user name if they opt-in to showing it
  • We want to know if someone scrolled the article

Draft schema, based on the above requirements:

  • lang - Language (or list of languages, if more than one) that the user has configured in the app.
  • pageTitle - Title of the article that was suggested.
  • imageTitle - File name of the image that was suggested for the article.
  • suggestionSource - Source from which this suggestion is being made, e.g. whether the image appears in another language wiki, inside a wikidata item, etc.
  • response - The response that the user gave for this suggestion. This could be a text field, i.e. literally yes, no, unsure, or a numeric value (0, 1, 2), whichever will be simpler for data analysis.
  • reason - The justification for the user's response. Since the user may select one or more reasons for their response, this will be a comma-separated list of values that correspond to "reasons" that we will agree on (0 = "Not relevant", 1 = "Low quality", etc)
  • detailsClicked - Whether the user tapped for more information on the image (true/false).
  • infoClicked - Whether the user tapped on the "i" icon in the toolbar.
  • scrolled - Whether the user scrolled the contents of the article that are shown underneath the image suggestion (true/false).
  • timeUntilClick - Amount of time, in milliseconds, that the user spent before tapping on the Yes/No/Not sure buttons.
  • timeUntilSubmit - Amount of time, in milliseconds, that the user spent before submitting the entire response, including specifying the reasons for selecting No or Not sure.
  • userName - The wiki username of this user. May be null if the user did not agree to share.
  • teacherMode - (true/false) Whether this feature is being used by a superuser / omniscient entity.

Details

Due Date
Mar 12 2021, 5:00 AM

Event Timeline

JTannerWMF created this task.
JTannerWMF set Due Date to Feb 12 2021, 5:00 AM.
JTannerWMF added a project: Product-Analytics.

Per discussion w @Dbrant we are going to add a schema to specifically track Image Recommendations, would be good to meet with Miriam while planning to determine what data they are collecting through API and what data we can track that would help them with the experiment.

@SNowick_WMF @sdkim just asked about data and is the PM for the team building out the API. I will schedule some time with our group and add @schoenbaechler

JTannerWMF lowered the priority of this task from Medium to Low.Feb 11 2021, 5:17 PM

Marking this as low until we Robin finish designs

JTannerWMF raised the priority of this task from Low to Medium.Feb 23 2021, 1:30 PM

Ok @schoenbaechler is finishing up the designs so I am increasing the priority of this. After the offsite it should increase to high.

JTannerWMF changed Due Date from Feb 12 2021, 5:00 AM to Mar 12 2021, 5:00 AM.Feb 23 2021, 1:31 PM

Here's my current "draft" of what the schema could look like:
(Once again, the idea is that whenever the user submits a single Image Recommendation item, we send an event to a new Eventlogging table that we'll use for this experiment.)

  • lang - Language of the user, i.e. the language wiki that they've set the app to.
  • pageTitle - Title of the article that was suggested.
  • imageTitle - File name of the image that was suggested for the article.
  • response - The response that the user gave for this suggestion. This could be a text field, i.e. literally yes, no, unsure, or a numeric value (0, 1, 2), whichever will be simpler for data analysis.
  • reason - The justification for the user's response. Since the user may select one or more reasons for their response, this will be a comma-separated list of values that correspond to "reasons" that we will agree on (0 = "Not relevant", 1 = "Low quality", etc)
  • detailsClicked - Whether the user tapped for more information on the image (true/false).
  • scrolled - Whether the user scrolled the contents of the article that are shown underneath the image suggestion (true/false).
  • timeSpent - Amount of time, in seconds, that the user spent looking at this suggestion before submitting a response.
  • userName - The wiki username of this user. May be null if the user did not agree to share.
  • teacherMode - (true/false) Whether this feature is being used by a superuser / omniscient entity.
JTannerWMF updated the task description. (Show Details)
JTannerWMF updated the task description. (Show Details)

Hey @Dbrant we got the following questions about the above schema:

  • Is instrumentation related to the editor tenure missing in the draft scheme because it will be found in existing instrumentation or via username?
  • Is the lang identifying all the different languages the app is set to, or only the 1st app language or device language? For example, someone might speak Danish natively and have that set as the primary language, but also may have added English and Chinese as additional languages?

@JTannerWMF asked me to review this plan. Here are my notes:

  • I think we should capture the experience level of the user, even if we don't capture their username. I think the best thing would just be their edit count.
  • For users whose usernames we cannot capture, can we capture some unique identifier for them, so we can analyze their submissions together?
  • Similarly to the comment above, it would be good to know what other languages the user users, beyond the one they are using for image recommendations.
  • Although the schema is "one event per submission", there are some things we'll want to study that are not associated with submissions:
    • How far the user makes it through the onboarding screens before they get to the first suggestion.
    • How far the user makes it through the onboarding tips before they are reviewing the first suggestion.
    • Whether the user taps the "i" icon in the upper right of the screen.
  • I see that the schema will capture the time the user spent on the task. Will this time complete once they tap yes/no/unsure? Or when they are finished submitting the reason? I recommend the former, as the time spent to make a decision.
  • Although the schema captures time spent, I think it could be good to also capture timestamps for the events.
  • My understanding is that we are asking for a reason for every "no" response, but not for every unsure response. In that case, I think we need to capture whether someone received the reason question, because we won't be able to infer for the unsure responses whether the user received the question and declined to answer, or didn't receive the question.
  • In addition to article title and filename, I think we might also want to record which image metadata fields were available for the image, because we want to understand how the presence of metadata influences responses. Though I suppose we could go get it from Commons after the fact, that might be difficult, and the file's metadata might change between the event and the analysis.
  • When the users scroll into the article, will they be able to click wikilinks or images, like in the normal reading experience? Will that lead them away from the task? If so, I think we should record an event for when that happens, so we know how often the article itself distracts people from doing the tasks.

@JTannerWMF

Is instrumentation related to the editor tenure missing in the draft scheme because it will be found in existing instrumentation or via username?

By editor tenure do we mean edit count? And if so, on which wiki? Do we mean the language wiki for which the suggestions are being made? Or across all language wikis that the user has configured? Or are we including edits made to Wikidata and Commons, via suggested edits? Not all these numbers are available to the app client-side. I would recommend determining editor tenure during data analysis, based on the user name. Also, if we include the "total edit count" in our event schema, would that border on personally-identifiable information (if the user chose not to share their username)?

Is the lang identifying all the different languages the app is set to, or only the 1st app language or device language? For example, someone might speak Danish natively and have that set as the primary language, but also may have added English and Chinese as additional languages?

Sure, we can update the lang field to be a list of all language(s) the user has configured in the app.


@MMiller_WMF

I think we should capture the experience level of the user, even if we don't capture their username. I think the best thing would just be their edit count.

See response to Jazmin's question above.

For users whose usernames we cannot capture, can we capture some unique identifier for them, so we can analyze their submissions together?

Done implicitly.

Similarly to the comment above, it would be good to know what other languages the user users, beyond the one they are using for image recommendations.

👍

How far the user makes it through the onboarding screens before they get to the first suggestion.
How far the user makes it through the onboarding tips before they are reviewing the first suggestion.

If we receive one of these events from a user, then they have by definition made it past onboarding (i.e. the onboarding is all-or-nothing). If we want to track if a user made it part-way through onboarding, and then gave up without doing a single recommendation, we would need a different separate schema.

Whether the user taps the "i" icon in the upper right of the screen.

👍

I see that the schema will capture the time the user spent on the task. Will this time complete once they tap yes/no/unsure? Or when they are finished submitting the reason? I recommend the former, as the time spent to make a decision.

¿Por que no los dos?

Although the schema captures time spent, I think it could be good to also capture timestamps for the events.

Done implicitly.

My understanding is that we are asking for a reason for every "no" response, but not for every unsure response. In that case, I think we need to capture whether someone received the reason question, because we won't be able to infer for the unsure responses whether the user received the question and declined to answer, or didn't receive the question.

What is meant by "declined to answer"? I don't think we have that as an option in our workflow.

In addition to article title and filename, I think we might also want to record which image metadata fields were available for the image, because we want to understand how the presence of metadata influences responses. Though I suppose we could go get it from Commons after the fact, that might be difficult, and the file's metadata might change between the event and the analysis.

👍

When the users scroll into the article, will they be able to click wikilinks or images, like in the normal reading experience? Will that lead them away from the task? If so, I think we should record an event for when that happens, so we know how often the article itself distracts people from doing the tasks.

The article is shown as plain text only, with no clickable links.

@SNowick_WMF

Editing Tenure is a bucket in Turnilo, not sure how we are recording that information.

In order for us to know the tenure of those completing the task so that we can look for trends, do we have to record usernames? Is there a precedent for this?

Schema:MobileWikiAppImageRecommendations

@JTannerWMF we can get user age from event.MobileWikiAppDailyStats using app_install_id

JTannerWMF renamed this task from Image Recommendations: Instrumentation to Image Recommendations: Instrumentation Analysis .Mon, Apr 19, 6:38 PM