Page MenuHomePhabricator

Image recommendation testing tool
Closed, ResolvedPublic

Description

*THIS TASK IS STILL UNDER CONSTRUCTION*

While we believe that adding images to unillustrated articles is going to be a doable and compelling structured task for newcomers, we have some open questions that can only be answered by putting experiences in front of actual users. Here are two big questions:

  1. Is our recommendation algorithm accurately surfacing matches that should be added to articles?
  2. Are newcomers able to confidently confirm the match using the article, image, and image metadata that we can provide?
  3. Are newcomers able to write good captions using the article, image, and image metadata that we can provide?
  4. Do newcomers find this task interesting/rewarding/difficult/boring/easy?

To pursue these questions, we want to build a simple tool that surfaces image recommendations and lets users choose whether to add them. This tool would not actually edit Wikipedia, but rather would just simulate the experience of adding images so that we can conduct user tests internally and externally.

One approach would be to build it in two phases:

Phase 1: barebones

Objective: let staff and interested community members quickly experience a series of real image recommendations, giving us all a feel for how strong the algorithm is.

Rough specifications:

  • Draws from a file of image matches and metadata. This exists in T266271.
  • Displays the article preview; recommended image; and description, caption, source, and depicts statements for the image.
  • User can click a link to open full article in new tab.
  • User can click a "Yes", "No", or "Skip" button to advance to the next recommendation.
  • Next recommendation is drawn randomly from the file.

Optional ideas, not totally thought-through:

  • We include a text field for writing a caption. It could co-exist with the "Yes", "No", "Skip" buttons.
  • User can type some identifier (like their name) into a text field during their session. When they click "Yes", "No", or "Skip", this records their name, response for that image, and caption somewhere for us to look at. Then we could see how often users agree on the recommendations, and how often they select "Yes", "No", or "Skip".

Phase 2: user experience

Objective: place real recommendations in a more complete user experience that we can give to live user testers, so that we can get a sense of how users feel about adding images to articles.

One idea is to improve the UI of the Phase 1 tool from above. Another idea might be to hardcode a short series of recommendations (perhaps 15 recommendations) to be served with an Axure prototype. This is all TBD.

Event Timeline

The scope and purpose is definitely different, but I thought it could be useful to mention the work @Cparle did in T268252 to create a tool to assess image relevance for MediaSearch.

Thanks, @CBogen! It looks like your team was thinking along the same lines. @Cparle, could you tell us a bit about your tool? Where do the matches come from? How many are there? Does it record the user's responses?

kostajh added subscribers: Catrope, Tgr, kostajh.

Per chat with @Tgr and @Catrope, I'll start on this now and we can probably collaborate on some aspects of it together next week.

@RHo @MMiller_WMF my initial thought on this is to use our existing Special:Homepage + Suggested Edits module infrastructure and deploy this on a non-production wiki (I think we have https://growthtasks.toolforge.org/ as a placeholder for now).

So, it would be something like the following:

  • the suggested edits module would be configured to have an "Add an image" task type, and its queue would come from the file in T266271.
  • the card in the suggested edits module would show the proposed image, the article title, wikidata description, so basically the same as our current suggested edits cards. I'm not sure how fitting "description, caption, source, and depicts statements" would work there, though.
  • clicking on a card would take you to the article (we will import en masse the article titles from the dataset).
  • on the article, we'd hook into our existing guidance and show an OOUI dialog that lets the user assess with yes/no/skip/caption
  • instead of eventlogging, we would create a database table that would record the answers along with the user ID and the image being reviewed

Does that sound like it would work for phase 1? Or do you have in mind something else?

@Cparle, could you tell us a bit about your tool? Where do the matches come from? How many are there? Does it record the user's responses?

The purpose of my tool is to assess the strength of each of the search signals we use.

  • dataset is generated using a set of about 1500 search terms, a mixture of popular and random (mostly in English, but there are some other languages and some search terms in non-latin alphabets)
  • for each search term we get the top 100 images using only one of the 10 search signals we use (statements, text, captions, categories, etc)
  • we record the search term itself, the search signal used, the position in the search results, and the elasticsearch score

So we have around a million images waiting to be rated. Yes, the users' responses are recorded, and once we have enough ratings we're hoping to assess which search signals have the strongest relationship between position/score and whether an image is a good match, and then focus on the signals with the best signal/noise ratio

Another thing we might use the ratings for is as an alternative to A/B testing for assessing changes to the search algorithm

@RHo @MMiller_WMF my initial thought on this is to use our existing Special:Homepage + Suggested Edits module infrastructure and deploy this on a non-production wiki (I think we have https://growthtasks.toolforge.org/ as a placeholder for now).

So, it would be something like the following:

  • the suggested edits module would be configured to have an "Add an image" task type, and its queue would come from the file in T266271.
  • the card in the suggested edits module would show the proposed image, the article title, wikidata description, so basically the same as our current suggested edits cards. I'm not sure how fitting "description, caption, source, and depicts statements" would work there, though.
  • clicking on a card would take you to the article (we will import en masse the article titles from the dataset).
  • on the article, we'd hook into our existing guidance and show an OOUI dialog that lets the user assess with yes/no/skip/caption
  • instead of eventlogging, we would create a database table that would record the answers along with the user ID and the image being reviewed

Does that sound like it would work for phase 1? Or do you have in mind something else?

Hi @Kostjh, @MMiller_WMF - here's some mocks we discussed yesterday offline for something that is possibly simpler in that we can skip need for opening up the article.

https://www.figma.com/file/2SONd8P1tsexIB5coMOp8h/Add-links-v1.0?node-id=551%3A0

image.png (1×2 px, 978 KB)

Change 643732 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/GrowthExperiments@master] DNM: Image suggestions prototype

https://gerrit.wikimedia.org/r/643732

@RHo @MMiller_WMF please have a look at the demo on growthtasks.toolforge.org and let me know what if anything you'd like to have changed before it's used to score image assessments. (For one thing, do you want the direct link to the Commons file page to be available, e.g. https://commons.wikimedia.org/wiki/File:Rybina,_Most_kolejowy_-_obrotowy_-_fotopolska.eu_(330804).jpg, when reviewing an article?

To test:

  • visit https://growthtasks.toolforge.org/
  • log in, you'll be redirected to meta.wikimedia.org where you either authorize the app (if already logged in to meta) or you log in to meta.wikimedia.org with your credentials for that site
  • back at growthtasks.toolforge.org, go to Special:Preferences and enable the Homepage)
  • On Special:Homepage you'll be able to perform image suggestion evaluations. The "next" arrow is only clickable after you make a choice
  • Results can be viewed on https://growthtasks.toolforge.org/wiki/Special:ImageSuggestionEvaluation and can be copy/pasted into Google Sheets for further analysis

@RHo @MMiller_WMF please have a look at the demo on growthtasks.toolforge.org and let me know what if anything you'd like to have changed before it's used to score image assessments. (For one thing, do you want the direct link to the Commons file page to be available, e.g. https://commons.wikimedia.org/wiki/File:Rybina,_Most_kolejowy_-_obrotowy_-_fotopolska.eu_(330804).jpg, when reviewing an article?

Hi @kostajh - thanks for putting this together so quickly! Yes, we want the link to the file page as well, and I noticed the metadata field for "Source" is also missing.
I had a quick look at the demo and made some requested and suggested changes itemized below:

Demo on growthtasks.toolforge.org
image.png (1×1 px, 492 KB)
Proposed update:
image.png (1×1 px, 568 KB)

Details of proposed changes above:

  • 1. Re-order the elements related to the article suggestion:
    • (a) Article title and extract should appear just before the radio controls
    • (b) Make the article title open in a new tab (remove the Article row from the metadata table)
    • (c) Add a label field above the article title and extract asking “Is this a good image for the article ____?”
  • 2. Re-order and add elements relating to the suggested image:
    • (a) The image metadata table appears directly below the image.
    • (b) Add the image file name as the first row
    • (c) Add the Source as the 2nd row
    • (d) Move “Caption” so that it is after the image “description”
    • (e) Add a link to the file page as the last row for more info.
  • 3. Layout changes
    • Remove all other modules (email, help) and make SE module whole homepage content area.
    • Increase height of image to 360px (and corresponding height of module)
    • Rename SE module
    • Remove filter bar for topics and task types
  • 4. Low prio layout and style suggestions:
    • Make the image background base80 (#eaecf0)
    • Make the image metadata table and article suggestions two col underneath the image
    • make table text smaller (font-size: 90%;)
    • add a divider above the radio button select group
    • padding 16px around the article and form buttons

I can also try to submit a patch for some of the more cosmetic changes if that helps (and if you show me know how to do so :D)

@kostajh -- thank you for assembling this! It's almost exactly what we need! In addition to @RHo's list, I have a few of my own. Please don't make any changes that will spill over onto an additional day.

  • Allow this to be usable anonymously, without logging in.
  • Let the homepage be enabled by default without a user going to their preferences.
  • Label the "Caption" field as "Commons caption", so that the people we give this to aren't confused about whether that is referring to the Wikipedia caption that we'll be asking the user to write themselves.
  • On the special page that lists the results, my ideal format would be one row per evaluation (as opposed to one row per image). And it would include some identifier for each user, as well as a timestamp for the evaluation. Perhaps that is their username or IP, or perhaps it's just some identifier for the session.

QuickView on commons Special:MediaSearch (e.g. https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=Faustus+Cornelius+Sulla+) may provide some additional ideas on what (and how) to present for an img file card.

Screen Shot 2020-12-01 at 4.25.33 PM.png (724×1 px, 1 MB)

QuickView on commons Special:MediaSearch (e.g. https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=Faustus+Cornelius+Sulla+) may provide some additional ideas on what (and how) to present for an img file card.

Screen Shot 2020-12-01 at 4.25.33 PM.png (724×1 px, 1 MB)

Thanks for sharing this @Etonkovidova - we definitely have this in mind once we start on the actual design. In particular, I think having the image dimensions would be a good additional data to show.

I've made it through @RHo's recommendations and updated https://growthtasks.toolforge.org

Next, I'll try to do:

  • homepage enabled by default
  • one row per evaluation

But I don't think I can easily do anonymous usage. I'll have a look tomorrow.

I've made it through @RHo's recommendations and updated https://growthtasks.toolforge.org

Next, I'll try to do:

  • homepage enabled by default

That part is done.

@RHo / @MMiller_WMF is anything else needed for this, or could I shut down the growthtasks.toolforge.org wiki and close this task?

Hi @kostajh - I'm good to resolve this task, but wanted to check whether it could easily be brought back in case we want to use it again in future?

Yes, we could, it would probably take 1-2 hours to reinstall the wiki.

for @RHo review if all objectives of this tasks are done and since you mentioned that you're ready to close the task.

Thanks all, we are good to close this up for now.

Change 643732 abandoned by Kosta Harlan:
[mediawiki/extensions/GrowthExperiments@master] DNM: Image suggestions prototype

Reason:

https://gerrit.wikimedia.org/r/643732