Page MenuHomePhabricator

Image suggestion evaluation August 2020
Closed, ResolvedPublic

Description

In the parent task (T256081), @Miriam generated lists of image recommendations for six languages. In this task, the following people will evaluate the recommendations in the lists:

I put the files in tabs in this spreadsheet: https://docs.google.com/spreadsheets/d/120ux_OPnqGWwrufgAvoBFBqDiPGquK4Xgd4UevLFuu0/edit#gid=778067505 (it is also possible to view all top images with their articles at once via this link). In our first pass, we will evaluate the first 50 articles in each list. I sorted the articles randomly so we are evaluating a representative group.

We'll classify the "top. image" into these categories, along with explanatory comments where useful:

ClassificationExplanation
2Great match for the article, illustrating the thing that is the title of the article; e.g. the article is "Food" and it is an image of food.
1Good match, but difficult to confirm for the article unless the user has some context, and would need a good caption; e.g. the article is "Food" and it is an image of a famous chef.
0Not a fit for the article at all; e.g. the article is "Food" and the image is a car.
-1Image is correct for the subject, but does not match the local culture; e.g. the article is "Food" and the image is a specific food from a specific culture that is not recognizable in the local culture.
-2Misleading image that a newcomer could accidentally think is correct; e.g. the article is "Taco" and the image is a burrito.

Details

Due Date
Aug 27 2020, 7:00 AM

Event Timeline

@revi @Trizek-WMF @PPham @Dyolf77_WMF @Urbanecm -- we're ready to start on this task, like we discussed in our meetings this week. Let's evaluate the first 50 article on each of our sheets. Please post any questions here. And when you're finished, please post a comment here leaving any overall notes or reactions to the algorithm.

MMiller_WMF moved this task from Incoming to In Progress on the Growth-Team (Current Sprint) board.

Hopefully you all aren't that far: because I'm lazy, I built a script that generates a page that shows both page title and the image in one place. You can view your own generated file at https://people.wikimedia.org/~urbanecm/growth-team/image-evaluation-aug-2020-T260857/. Hope this helps!

Some general comments after finishing the evaluation:
Most of the suggestions are on animals/flowers, astronomical bodies, biographies and geographical locations in my case:

  • Animals/flowers: Mostly wrong suggestions as it suggests image of species in the same genus or family but they are different species.
  • Astronomical bodies: This is hard to evaluate since my knowledge about this subject is narrow, but in my impression none in this group is a "2".
  • Biographies: Has a high rate of correct suggestion, perhaps because you can just match the name of the article with file's name
  • Geographical locations: has the highest rate of correct suggestion, same reason with biographies maybe.

If you want to confirm whether it fits or not, you have to actually follow the file to other languages, read the articles, google search the subject in some cases... overall most of the evaluation would require more than two clicks. Some of them would not be suitable for newcomers - it could be hard for them to navigate through different languages and projects (wikidata, wikispecies, wikicommons, etc.).

Of course it is a promising idea, but let's not forget that the newcomer is not as dedicated to Wikipedia as us, so perhaps they won't want to waste so much time clicking around, but instead want a more immediate and direct result/edit? But this type of suggestion is definitely valuable to us experienced users lol.

Finished the evaluation of ar.wiki suggestions. Suggestions can be improved by:

  • hiding low quality/resolution images (they are used as icons),
  • hiding images with DR (Deletion Request) tags,
  • looking for local images in Other files section (especially for translatable maps/diagrams).

Hopefully you all aren't that far: because I'm lazy, I built a script that generates a page that shows both page title and the image in one place. You can view your own generated file at https://people.wikimedia.org/~urbanecm/growth-team/image-evaluation-aug-2020-T260857/. Hope this helps!

Thank you Martin! This is really helpful!

Concerning the samples we have to review, after a meeting with Revi and Phuong, we all agreed on the lack of diversity. For instance, Phuong and I have a lot of asteroids, Phuong also says that she has a lot of animals. It doesn't seems to be well balanced. @Miriam, is it normal to have this lack of diversity?

Hopefully you all aren't that far: because I'm lazy, I built a script that generates a page that shows both page title and the image in one place. You can view your own generated file at https://people.wikimedia.org/~urbanecm/growth-team/image-evaluation-aug-2020-T260857/. Hope this helps!

Thank you Martin! This is really helpful!

I'm glad you like it! A code author is always happy to hear someone used it and liked it ;).

Concerning the samples we have to review, after a meeting with Revi and Phuong, we all agreed on the lack of diversity. For instance, Phuong and I have a lot of asteroids, Phuong also says that she has a lot of animals. It doesn't seems to be well balanced. @Miriam, is it normal to have this lack of diversity?

Maybe I'm highly tolerant to lack of diversity, but I don't feel any lack of diversity in my set of images.

Hi all. A quick note that Miriam will be back at work on 2020-09-07 and will likely not be able to respond to your questions until then.

@Trizek-WMF I'm done with the first 50 items. Let me know if there is anything more I should do.

Trizek-WMF updated the task description. (Show Details)

We are all done with our items (Habib helped me for French). I let Marshall check on the results and finish his batch.

Is it possible to refine by section title to illustrate a section? For instance, an article about a city in which there is a "tourism" section: is the recommendation tool able to find a picture of the local beach (for example) for this section?

@revi @Urbanecm -- could you please put your overall comments on the results here?

@Miriam -- we are finished evaluating this round of image recommendations. For six languages, we evaluated 50 random matches. The results are in the sheets in this workbook: https://docs.google.com/spreadsheets/d/120ux_OPnqGWwrufgAvoBFBqDiPGquK4Xgd4UevLFuu0/edit#gid=383843253. There is a graph of results in the "Summary" tab.

We classified the matches into these classifications:

ClassificationExplanation
2Clear match for the article, illustrating the thing that is the title of the article; e.g. the article is "Food" and it is an image of food.
1Appropriate match, but difficult to confirm for the article unless the user has some context, and would need a good caption; e.g. the article is "Food" and it is an image of a famous chef.
0Not a fit for the article at all; e.g. the article is "Food" and the image is a car.
-1Image is correct for the subject, but does not match the local culture; e.g. the article is "Food" and the image is a specific food from a specific culture that is not recognizable in the local culture.
-2Misleading image that a newcomer could accidentally think is correct; e.g. the article is "Oak tree" and the image is an elm tree.

Here are the toplines:

  • Depending on the wiki, 20-40% of matches were 2s. We have to talk about what accuracy level we think is minimal for the newcomer experience.
  • The number of 1s calls into question many design challenges: how much information can we give users to investigate these matches? For instance, if it's the article on Albert Einstein in Arabic Wikipedia, and the suggested image is of Albert Einstein's childhood home, and the name and description of that image are in German, how can the user determine that it's a good match, and not just a random house?
  • Depending on the wiki, -2s could be up to 30% of the matches. These seem to be caused largely by this phenomenon: there is an unillustrated article about a specific butterfly species (or asteroid or whatever). Some other wiki has an article about that butterfly, and it has a "butterflies" navbox at the bottom, which uses a certain butterfly image. All butterfly articles in that wiki therefore have that one butterfly image at the bottom, and so it is erroneously recommended for the wrong species.
  • The specific sheets for each language contain many notes that will help us refine this algorithm. I recommend reading through all of them to discover patterns that can be addressed.
  • There are also comments above, in this task, listing some clear areas for improvement from the people who did the evaluation.

Here is the graph from the "Summary" tab:

@MMiller_WMF: Hi, the Due Date set for this open task is more than two months ago.
Can you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks.

We've finished with this task, and we're moving on to continuing the work in other image tasks. Thank you!