Page MenuHomePhabricator

Categorizing Uncategorized images on Wikimedia Commons
Open, MediumPublic

Assigned To
None
Authored By
xSavitar
Jun 11 2018, 3:52 PM
Referenced Files
None
Tokens
"Orange Medal" token, awarded by rosalieper."Meh!" token, awarded by Steinsplitter."Meh!" token, awarded by xSavitar.

Description

This project attempts to solve the issue faced with uncategorized images on Wikimedia Commons. Wikimedia Commons has a lot of images (over 47M images as of today) and many of these images are not categorized (or say uncategorized).

Trying to solve this problem is time consuming but we will want to try this out. We have 2 approaches in mind;

  1. Manual categorization via edits made my users (tedious but bot needed if possible).
  2. Machine Learning Algorithms / bots looking at pictures and existing categories and does the categorization or maybe create new categories for this purpose.

This is a long term project and development will begin and run till infinity!

Event Timeline

xSavitar triaged this task as Medium priority.Jun 11 2018, 3:52 PM
xSavitar created this task.

I'm interested in this. It's high time we reincarnate this! Thanks for the help @Steinsplitter. If you find any resources or pointers, please do share as I think it would be good to keep working on this so as more and more content keeps coming, at least 20% of it keeps getting categories :)

The bot is no longer operating because:
https://commons.wikimedia.org/wiki/User:CategorizationBot#Why_isn't_the_bot_running?

Maybe @Multichill can share the code so that people can get an idea how the logic works.

Please could some one share the link to the bot repository? Just checked on the frequently asked questions and it seems the maintenance of the bot has stop.

The bot is no longer operating because:
https://commons.wikimedia.org/wiki/User:CategorizationBot#Why_isn't_the_bot_running?

Maybe @Multichill can share the code so that people can get an idea how the logic works.

Yeah

So I stopped operating the bot back in 2015 because the time it would cost to fix it didn't add up with the (negative) feedback.
If there are multiple people who are willing to help out here I'm more than happy to invest a bit of time to set the thing up again.

If you look at https://commons.wikimedia.org/wiki/User:CategorizationBot#Process you'll see it's basically three parts:

  1. Find uncategorized files
  2. Try to get uncategorized files categorized
  3. Notify users of newly uncategorized files

The first job is part at Pywikibot, see https://www.mediawiki.org/wiki/Manual:Pywikibot/imageuncat.py . It ran once a day with the -yesterday option.

For the second (categorization) job:

Third part can probably just be fired up again if we want it ( https://github.com/multichill/toollabs/blob/master/bot/notify_uncategorized.py )

So basically the re-categorization has to be written from scratch based on Wikidata and we have to figure out something for the overcategorization.

Vvjjkkii renamed this task from Categorizing Uncategorized images on Wikimedia Commons to e9aaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
CommunityTechBot renamed this task from e9aaaaaaaa to Categorizing Uncategorized images on Wikimedia Commons.Jul 2 2018, 9:36 AM
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)

Thanks for the wonderful information @Multichill. Let's digest the information and then revert to you with next steps. Thanks :)

Hello @Multichill, after some digging, the first part has already been handled by Pywikibot right? So we don't need to touch it? Or is there something to be done there?