Page MenuHomePhabricator

Image Recommendations MVP: Product Exploratory Questions
Closed, ResolvedPublic

Description

Open Questions

  • How can we grade all 3000 options by the end of the experiment? How many people will we need in order to accomplish this over what length of time?
  • Do we want people to tell us why they rejected/skipped? Which options?
  • How many recommendations should we show each user?
  • Do we want to show it to everyone, language requirements?
  • How will we know people like the task?
  • Can we tell if things are easy vs hard?
  • Can we see how people like it and if so, by user type?
  • What will be our check points to determine if we tweak
  • Do we want to test how well users understand and their ability to write good captions for the accepted recommendations?
  • How do we learn what supplementary information users need to make good decisions?
  • Which info (image categories, source of suggestion, image resolution, etc) do people refer to the most in making their decisions?
  • Which information leads to more accurate ratings?
  • Do we also want to enable the ability for users to filter suggestions by interest topics ORES in this test tool?
  • Practical usage by (esp. experienced) editors - do we want to monitor if people start using the tool in unintended ways? Such as using it to find unillustrated articles to add their own images?
  • How do we compare non-English vs English speaking users? (Example: here’s Image A, we want to get 3 ratings from 3 separate users to determine accuracy. How does it cloud the accuracy if Image A is rated by 3 non-english speaking users, vs if it is rated by 3 english users, vs a mix?)

Product Decisions

  • We will have one suggested image per article instead of multiple images
  • This iteration of the MVP will not include Image Captions
  • There are no language constraints for this task. As long as there is an article available in the language we will surface it. We want to be deliberate in ensuring this task is completed by a variety of languages. For this MVP to be considered a success, we want the task completed in at least five different languages including English, an indic language and Latin language.
  • We will have a check point two weeks after the launch of the feature to check if the feature is working properly and if modifications need to be made in order to ensure we are getting the answers to our core questions. The check point is not intended to introduce scope creep.
  • We aren't able to filter by categories in this iteration of the MVP, but it could be a possibility in the future through the CPT API
  • We will Surface a survey each time a user says no and sparingly surface a survey when a user clicks Not Sure or Skip
  • We need three annotations from 3000 different users on 3000 different matches. By having these three annotations, the tasks will self grade.
  • We will know people like the task if they return to complete it on three distinct dates we will compare frequency of return by date across user type to understand if there was more stickiness for this task by how experienced a user is
  • Once we pull the data we will be able to compare the habits of English vs. Non English users. We can not / do not need to show the same image to both non English and English users. Non English users will have different articles and images. We will know if a task was hard due to language based on their response to the survey if they click no or not sure. We will check task retention to see how popular the task is by language.
  • In order to know if the task is easy or hard, we would like to be able to see how long it is taking them to complete it. ****NOTE: This only works if we can see if someone backgrounds the app. Of the people that got it right, how long did it take them?
  • In order to know if the task is easy or hard we should also track if they click to see more information about the task, in order to make a decision
  • We determined that it is not worth adding extra clicks to see what metadata is used that is found helpful. Perhaps we allow people to swipe up for more information and it generally provides the meta data??? Will need to see designs to compare this
  • It is too hard, at least for this MVP, to track if experienced users use this tool to add images to articles manually without using the tool, so we aren't going to track that.
  • In the designs we want to track if someone skips or press no on an image because the image is offensive in order to learn how often NSFW or offensive material appears

Event Timeline

@MMiller_WMF and I talked about this and @JTannerWMF will review the user testing deck to see which metadata is important

We have answered all of the questions in this task so we can close it out