Page MenuHomePhabricator

Testing image recommendations with V3
Closed, ResolvedPublic

Description

As we work on an "add an image" structured task, there are two main things that are going to affect the success of users presented with suggested images:

  1. How accurate is the algorithm?
  2. How hard is it to confidently verify a match?

In other words, the algorithm might be quite accurate -- but it still might be difficult for the user to verify that it's accurate given the information. For instance, if the article is about a person, and the photo is of that person, then the algorithm was accurate. But if the photo's title and description don't contain the name of the subject, it might be hard for the user to verify that they really match.

Here's our idea for getting a sense of how difficult the task is and what metadata users need in order to complete it. This will help us tune the algorithm and design the user experience.

  1. @Miriam can generate a list of something like 1,000 image recommendations for unillustrated articles in English Wikipedia, like was done on the "first version" of the algorithm in T256081. This time, though, we want to include lots of metadata.
    1. Title
    2. Commons description
    3. Commons caption
    4. Other wikis that the image is used on for that same article
    5. Depicts statements
    6. Commons categories
    7. Source of the match (Wikidata item, Commons category, cross-wiki, etc.)
    8. Anything else the user might like to have?
  2. The Growth team will then take that dataset and use it to back a simple tool that displays one match at a time, along with some portion of the metadata. Users can open the article to check it out, and simply click "yes", "no", or "skip" to see the next suggestion.
  3. Then we run usertesting.com tests where we ask testers to go through 15 suggestions or so, talking about how they are deciding whether to make the match. We could even run tests using different subsets of the metadata to see which works best.

Event Timeline

MMiller_WMF renamed this task from Dataset to test difficult of image suggestion to Dataset to test difficulty of image suggestion.Oct 22 2020, 6:36 PM
MMiller_WMF created this task.

@MMiller_WMF do you want me to use exactly the same algorithm as in the first version, or is it ok to do a couple of quick fixes (e.g. select only lead images) to improve accuracy?

@Miriam -- it is certainly okay to make a couple quick fixes. I only said that about the first algorithm because I thought it would save you time.

Hi @MMiller_WMF , sorry for the delay on this, I took some more time so that I could also refine the algorithm a bit better.
Please find the list of images attached. I generated this as follows:

  1. Selected all unlilustrated articles as articles having no images or icons only (results in ~3M articles out of 6M English Wikipedia articles)
  2. For each unillustrated article, I looked for: Wikidata Image, Wikidata-Commons Category, Lead image of articles in other languages (results in ~500k articles with potential candidates)
  3. Sampled 50k articles, and did a bit of candidate filtering to get rid of maps, images that are on-Wiki only, and image placeholders (rule based) (results in ~25k articles with potential candidates)
  4. Selected one image out of the resulting candidates as follows - and recorded the source:
    • If the Wikidata image exists, select it at the article image
    • Otherwise, if the Wikidata category exists, select the article image at random from all images in the category
    • Otherwise, select the image that appears as lead image for most languages
  5. Finally, by parsing the html page of the selected image on Commons (using a script by @Swagoel) I extracted:
    • description (not available for 65% of the selected images)
    • caption (not available for 92% of the selected images)
    • categories (not available for 26% of the selected images)
    • structured data (not available for 93% of the selected images)
  6. I selected only those 434 articles (out of 50k) for which we have an image recommendation with description, caption, categories and structured data.

@Miriam -- thanks for generating this list. I just looked through it a little bit, and there are two changes I would like before we proceed with putting it in front of users:

  1. Out of the 434 articles you list, 233 of them get matched with images for flags. These are articles like "Liberia_at_the_2020_Summer_Olympics" and "Russia_at_the_2018_European_Championships". Most of these articles actually already have the flag image in them, just inherited through templates[[ https://en.wikipedia.org/wiki/Template:Infobox_country_at_games | like this one ]]. Can we do something to exclude these, since they are already illustrated?
  2. Could you actually export the whole dataset, instead of just filtering to the ones that have all the metadata fields? I think we would want to test some images that don't have all the metadata, and I can filter it on my end.

Thank you!

HI Marshall,
I made a general clean up of the methodology to basically

  1. Exclude all .svg images from the potential suggestions
  2. Exclude all flags as suggested

There are suggestions for around 13k out of 50k articles in here: https://drive.google.com/file/d/1aMlYXP8eKORx8V0m98dIUNcpmCNrADFI/view?usp=sharing

Thanks, @Miriam! It looks good. @RHo and I are now working on how to put this in front of user testers.

MMiller_WMF renamed this task from Dataset to test difficulty of image suggestion to Testing image recommendations with V3.Nov 24 2020, 11:16 PM

@Miriam -- @RHo and I have started working with this data, and we have a couple of questions and requests.

  • I noticed that you said that 65% of the images are missing descriptions. I'm surprised to hear that, because Commons requires descriptions when uploading. At first I thought maybe it's that they are missing English descriptions, or maybe that Commons didn't implement that policy until more recently. But I found this example image, which doesn't have a description in your dataset, but does have one on Commons.
  • In making our prototypes, we realized there is some other image metadata we would love to have. Is it easy for you to re-export with these additional elements?
    • Username who contributed image
    • Date image was uploaded to Commons
    • Dimensions of image (e.g. 546 × 728)
    • File type (e.g. jpeg)
    • Copyright (e.g. Creative Commons Attribution-Share Alike 4.0)

Thank you!

Hi @MMiller_WMF, that's interesting, I used some code to parse the HTML of the Commons page, maybe there are different ways of marking the description and I missed it. I'll double check. I will have to work on extending the code to get the additional information you need. I will need a few days at least.

Hi @MMiller_WMF, I modified the code to parse the HTML of the Commons page (there must be a better way to do this, but for now this is what we have) - it now includes more descriptions (missing 40% only) and all the additional data you requested. Some copyright statements and dates are not available in structured data, and for now these are ignored. Please check the sample data attached and let me know if it looks good. If yes, I can run it large scale and give back more suggestions.

Hi @MMiller_WMF, I modified the code to parse the HTML of the Commons page (there must be a better way to do this, but for now this is what we have) - it now includes more descriptions (missing 40% only)

Tagging @matthiasmullie just in case you know of any better way to get description information from Commons.

Hi @MMiller_WMF, I modified the code to parse the HTML of the Commons page (there must be a better way to do this, but for now this is what we have) - it now includes more descriptions (missing 40% only)

Tagging @matthiasmullie just in case you know of any better way to get description information from Commons.

I'm not really sure how it's currently being done, but whatever it is seems to already produce decent results (going by a quick glance at the tsv)
Best way to get somewhat usable media page content out of Commons is via the CommonsMetadata extension, which attempts to extract things out of the html that is produced by the templates maintained by the community.
It can be fetched with a simple API call, like this: https://commons.wikimedia.org/w/api.php?action=query&titles=File:June_odd-eyed-cat_cropped.jpg&prop=imageinfo&iiprop=extmetadata

@Miriam -- thanks for generating that sample file. I think it looks mostly good. My only comment is: it looks like the "date" column contains some "photo dates" (when the photo was taken), and some "upload dates" (when the photo was uploaded). If you have any control over it, we would prefer upload dates. Could you please run it for those same ~15,000 you gave us before? Then we'll add that into the prototype.

Also, I'm wondering if while you're running this, you could export several thousand for Arabic Wikipedia. We would like to develop a sense of what kind of coverage there is for these different fields in non-English languages. One concern is that images that match to Arabic articles might have lots of English descriptions/titles/captions, and then be difficult for Arabic users to evaluate. Perhaps there are some counts we could develop in that domain -- like what numbers of matches in each language have captions and descriptions in English or other languages. What do you think?

@Thanks Marshall!
Here you have the new file for English with the change you required on the date field: https://drive.google.com/file/d/1kVB5krC9SyFxvJqwehWW8IDRPQ2NCR_b/view?usp=sharing
Below the metadata coverage:

missing descriptions: 27%
missing captions: 95%
missing categories: 0.0
missing depicts: 92%

Re- running this for Arabic Wikipedia. Could you clarify what you would like me to do? Would you like me to find image matches for unillustrated articles in Arabic, then generate metadata for those, or would you like me to check the presence of Arabic metadata on these image candidates?

Thanks, @Miriam. This is just what we needed, and we're using it already.

For Arabic Wikipedia: we would like you do the same thing you did for English. Generate a file of lots of candidates, include the metadata, and also post the metadata coverage numbers. We would also like to know the sorts of stats you included in T266271#6634006, showing how many articles are left at each stage. Does that make sense?

@MMiller_WMF, I ran v3 on the other languages, here are the overall results:

arwiki
number of unillustrated articles: 606231
number of articles items with Wikidata image: 6971
number of articles items with Wikidata Commons Category: 31008
number of articles items with Language Links: 139603
kowiki
number of unillustrated articles: 291795
number of articles items with Wikidata image: 15875
number of articles items with Wikidata Commons Category: 28111
number of articles items with Language Links: 84564
viwiki
number of unillustrated articles: 928876
number of articles items with Wikidata image: 76175
number of articles items with Wikidata Commons Category: 88906
number of articles items with Language Links: 176263
cswiki
number of unillustrated articles: 177968
number of articles items with Wikidata image: 7865
number of articles items with Wikidata Commons Category: 20381
number of articles items with Language Links: 67014
frwiki
number of unillustrated articles: 1037172
number of articles items with Wikidata image: 18416
number of articles items with Wikidata Commons Category: 48315
number of articles items with Language Links: 284571

Next, I ran the filtering/prioritization/metadata extraction for 20k suggestions for cswiki, kowiki, and arwiki.

Results and samples below. Categories are only available in English. Structured data such as copyright or depicts statement is generally in the native language, only in English when native language is not available on Wikidata (this is a very tiny %, I didn't really have time to understand how to parse the data to get this corner case.) It looks like that, as we expected, the metadata coverage is very tiny.

cswiki
percentage of articles for which we have good candidates: 52%
missing descriptions: 97%
missing captions: 99%
missing categories: 0.0
missing depicts: 92%

arwiki
percentage of articles for which we have good candidates: 55%
missing descriptions: 98%
missing captions: 99%
missing categories: 0.0
missing depicts: 93%

kowiki
percentage of articles for which we have good candidates: 43%
missing descriptions: 99%
missing captions: 99%
missing categories: 0.0
missing depicts: 93%

Resolving this, feel free to reopen if there are more TODOs here !