Page MenuHomePhabricator

Categorizing Uncategorized images on Wikimedia Commons
Open, NormalPublic

Description

This project attempts to solve the issue faced with uncategorized images on Wikimedia Commons. Wikimedia Commons has a lot of images (over 47M images as of today) and many of these images are not categorized (or say uncategorized).

Trying to solve this problem is time consuming but we will want to try this out. We have 2 approaches in mind;

  1. Manual categorization via edits made my users (tedious but bot needed if possible).
  2. Machine Learning Algorithms / bots looking at pictures and existing categories and does the categorization or maybe create new categories for this purpose.

This is a long term project and development will begin and run till infinity!

Event Timeline

D3r1ck01 created this task.Jun 11 2018, 3:52 PM
D3r1ck01 triaged this task as Normal priority.

I'm interested in this. It's high time we reincarnate this! Thanks for the help @Steinsplitter. If you find any resources or pointers, please do share as I think it would be good to keep working on this so as more and more content keeps coming, at least 20% of it keeps getting categories :)

The bot is no longer operating because:
https://commons.wikimedia.org/wiki/User:CategorizationBot#Why_isn't_the_bot_running?

Maybe @Multichill can share the code so that people can get an idea how the logic works.

Please could some one share the link to the bot repository? Just checked on the frequently asked questions and it seems the maintenance of the bot has stop.

The bot is no longer operating because:
https://commons.wikimedia.org/wiki/User:CategorizationBot#Why_isn't_the_bot_running?

Maybe @Multichill can share the code so that people can get an idea how the logic works.

Yeah

@Multichill, do you mind? :), let's do this :)

So I stopped operating the bot back in 2015 because the time it would cost to fix it didn't add up with the (negative) feedback.
If there are multiple people who are willing to help out here I'm more than happy to invest a bit of time to set the thing up again.

If you look at https://commons.wikimedia.org/wiki/User:CategorizationBot#Process you'll see it's basically three parts:

  1. Find uncategorized files
  2. Try to get uncategorized files categorized
  3. Notify users of newly uncategorized files

The first job is part at Pywikibot, see https://www.mediawiki.org/wiki/Manual:Pywikibot/imageuncat.py . It ran once a day with the -yesterday option.

For the second (categorization) job:

Third part can probably just be fired up again if we want it ( https://github.com/multichill/toollabs/blob/master/bot/notify_uncategorized.py )

So basically the re-categorization has to be written from scratch based on Wikidata and we have to figure out something for the overcategorization.

Xqt added a subscriber: Xqt.Jun 14 2018, 9:51 AM
Vvjjkkii renamed this task from Categorizing Uncategorized images on Wikimedia Commons to e9aaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
CommunityTechBot lowered the priority of this task from High to Normal.
CommunityTechBot renamed this task from e9aaaaaaaa to Categorizing Uncategorized images on Wikimedia Commons.

Thanks for the wonderful information @Multichill. Let's digest the information and then revert to you with next steps. Thanks :)

Hello @Multichill, after some digging, the first part has already been handled by Pywikibot right? So we don't need to touch it? Or is there something to be done there?

D3r1ck01 updated the task description. (Show Details)Aug 16 2018, 8:07 PM