Page MenuHomePhabricator

Testing image recommendations with V3
Open, Needs TriagePublic

Description

As we work on an "add an image" structured task, there are two main things that are going to affect the success of users presented with suggested images:

  1. How accurate is the algorithm?
  2. How hard is it to confidently verify a match?

In other words, the algorithm might be quite accurate -- but it still might be difficult for the user to verify that it's accurate given the information. For instance, if the article is about a person, and the photo is of that person, then the algorithm was accurate. But if the photo's title and description don't contain the name of the subject, it might be hard for the user to verify that they really match.

Here's our idea for getting a sense of how difficult the task is and what metadata users need in order to complete it. This will help us tune the algorithm and design the user experience.

  1. @Miriam can generate a list of something like 1,000 image recommendations for unillustrated articles in English Wikipedia, like was done on the "first version" of the algorithm in T256081. This time, though, we want to include lots of metadata.
    1. Title
    2. Commons description
    3. Commons caption
    4. Other wikis that the image is used on for that same article
    5. Depicts statements
    6. Commons categories
    7. Source of the match (Wikidata item, Commons category, cross-wiki, etc.)
    8. Anything else the user might like to have?
  2. The Growth team will then take that dataset and use it to back a simple tool that displays one match at a time, along with some portion of the metadata. Users can open the article to check it out, and simply click "yes", "no", or "skip" to see the next suggestion.
  3. Then we run usertesting.com tests where we ask testers to go through 15 suggestions or so, talking about how they are deciding whether to make the match. We could even run tests using different subsets of the metadata to see which works best.

Event Timeline

MMiller_WMF renamed this task from Dataset to test difficult of image suggestion to Dataset to test difficulty of image suggestion.Oct 22 2020, 6:36 PM
MMiller_WMF created this task.

@MMiller_WMF do you want me to use exactly the same algorithm as in the first version, or is it ok to do a couple of quick fixes (e.g. select only lead images) to improve accuracy?

@Miriam -- it is certainly okay to make a couple quick fixes. I only said that about the first algorithm because I thought it would save you time.

Miriam added a subscriber: Swagoel.Thu, Nov 19, 4:15 PM

Hi @MMiller_WMF , sorry for the delay on this, I took some more time so that I could also refine the algorithm a bit better.
Please find the list of images attached. I generated this as follows:

  1. Selected all unlilustrated articles as articles having no images or icons only (results in ~3M articles out of 6M English Wikipedia articles)
  2. For each unillustrated article, I looked for: Wikidata Image, Wikidata-Commons Category, Lead image of articles in other languages (results in ~500k articles with potential candidates)
  3. Sampled 50k articles, and did a bit of candidate filtering to get rid of maps, images that are on-Wiki only, and image placeholders (rule based) (results in ~25k articles with potential candidates)
  4. Selected one image out of the resulting candidates as follows - and recorded the source:
    • If the Wikidata image exists, select it at the article image
    • Otherwise, if the Wikidata category exists, select the article image at random from all images in the category
    • Otherwise, select the image that appears as lead image for most languages
  5. Finally, by parsing the html page of the selected image on Commons (using a script by @Swagoel) I extracted:
    • description (not available for 65% of the selected images)
    • caption (not available for 92% of the selected images)
    • categories (not available for 26% of the selected images)
    • structured data (not available for 93% of the selected images)
  6. I selected only those 434 articles (out of 50k) for which we have an image recommendation with description, caption, categories and structured data.

@Miriam -- thanks for generating this list. I just looked through it a little bit, and there are two changes I would like before we proceed with putting it in front of users:

  1. Out of the 434 articles you list, 233 of them get matched with images for flags. These are articles like "Liberia_at_the_2020_Summer_Olympics" and "Russia_at_the_2018_European_Championships". Most of these articles actually already have the flag image in them, just inherited through templates[[ https://en.wikipedia.org/wiki/Template:Infobox_country_at_games | like this one ]]. Can we do something to exclude these, since they are already illustrated?
  2. Could you actually export the whole dataset, instead of just filtering to the ones that have all the metadata fields? I think we would want to test some images that don't have all the metadata, and I can filter it on my end.

Thank you!

HI Marshall,
I made a general clean up of the methodology to basically

  1. Exclude all .svg images from the potential suggestions
  2. Exclude all flags as suggested

There are suggestions for around 13k out of 50k articles in here: https://drive.google.com/file/d/1aMlYXP8eKORx8V0m98dIUNcpmCNrADFI/view?usp=sharing

Thanks, @Miriam! It looks good. @RHo and I are now working on how to put this in front of user testers.

MMiller_WMF renamed this task from Dataset to test difficulty of image suggestion to Testing image recommendations with V3.Tue, Nov 24, 11:16 PM