Page MenuHomePhabricator

[L] Determine how many Commons users would be notified of how many image matches
Closed, ResolvedPublic

Description

We are considering notifying Commons users about whether their uploaded images match to articles in the languages they speak. This task is to gather some numbers to see if that is worthwhile.

Run a script to determine:

  • Across all images on Commons, how many are potential match candidates for articles in the languages that the user who uploaded them speak?
  • Across the images uploaded to Commons in the last month, how many are potential match candidates for articles in the languages that the user who uploaded them speak?
  • Across all images on Commons, how many users would have potential matches in the languages that they speak/receive notifications?
  • Across all the images upload to Commons in the last month, how many users would have potential matches in the languages that they speak/receive notifications?

Event Timeline

CBogen renamed this task from Determine how many Commons users would be notified of how many image matches to [L] Determine how many Commons users would be notified of how many image matches.Apr 21 2021, 4:34 PM

This will take quite a while to run. There are about 13M unillustrated articles across all languages, and looking at the API response times for populating the data for https://image-recommendation-test.toolforge.org/ it takes approx 0.1s to get one unillustrated-article-plus-recommendations. That means 1.3 million seconds to get them all, which is around 15 days ... and that's just to get the articles, without looking into users and their languages etc

While there's some work going on around improving response times I'm going to fake the image recommendation API and work on the rest of the the stuff

This will take quite a while to run. There are about 13M unillustrated articles across all languages, and looking at the API response times for populating the data for https://image-recommendation-test.toolforge.org/ it takes approx 0.1s to get one unillustrated-article-plus-recommendations. That means 1.3 million seconds to get them all, which is around 15 days ... and that's just to get the articles, without looking into users and their languages etc

While there's some work going on around improving response times I'm going to fake the image recommendation API and work on the rest of the the stuff

To make sure I understand, what do you mean when you say fake the API?

I downloaded all the data for Image Matching Algorithm, so I have all that data locally, then I'm querying the search api ... not really sure how much quicker it is, might have to move it to toolforge or looking into parallelising queries

Ok here are some results, based on trawling through the image-suggestions api in August 2021, and using wiki snapshots from June

If we sent notifications to watchers of unillustrated articles for which there is a suggested image, we would send notifications to:

  • ptwiki: 10358 users watching 22800 articles
  • dewiki: 32607 users watching 50526 articles
  • ruwiki: 45267 users watching 97420 articles
  • enwiki: 758487 users watching 1160574 articles

If we sent notifications to watchers of all images suggested for an unillustrated article for a particular wiki regardless of the languages they speak, we would send notifications to:

  • ptwiki: 10824 users watching 18334 images
  • dewiki: 13601 users watching 28185 images
  • ruwiki: 30586 users watching 82606 images
  • enwiki: 73340 users watching 341610 images

If we sent notifications to watchers of all images suggested for an unillustrated article for a particular wiki in a language they have indicated that they speak, we would send notifications to:

  • ptwiki: 70 users watching 772 images
  • dewiki: 565 users watching 3447 images
  • ruwiki: 305 users watching 3838 images
  • enwiki: 2667 users watching 62653 images

If we sent notifications to watchers of images uploaded in May 2021 suggested for an unillustrated article for a particular wiki regardless of the languages they speak, we would send notifications to:

  • ptwiki: 225 users watching 100 images
  • dewiki: 276 users watching 154 images
  • ruwiki: 569 users watching 336 images
  • enwiki: 1645 users watching 2078 images

If we sent notifications to watchers of images uploaded in May 2021 suggested for an unillustrated article for a particular wiki in a language they have indicated that they speak, we would send notifications to:

  • ptwiki: 0 users watching 0 images
  • dewiki: 11 users watching 9 images
  • ruwiki: 8 users watching 9 images
  • enwiki: 97 users watching 122 images

Here's the notebook code used to work out the numbers

And here's the dump of the commons babel table used to get user languages

I noticed this got mentioned in the weekly C-level report. Tagging @MMiller_WMF in case this should be on his radar for Growth's Add an Image project. Also tagging @cchen for visibility and support in case there are analytics questions.

Cparle updated the task description. (Show Details)