Page MenuHomePhabricator

[2018] Documentation: describe how to get off the categorisation skip-list
Open, Needs TriagePublic

Description

We currently skip-list countries (datasets) with too many uncategorised images that the bot repeatedly fail to categorise.

We should offer clear instructions for what has to be done for a country to get off the skip-list.

Once these steps has been described they should be added to com:Commons:Monuments database/Categorization#Skipped countries(as a linked subpage) and communicated to the local organisers of the affected countries.


draft text

Before investigating why a country is skip-listed it is a good idea to familiarise yourself with how how the categorisation works

Skip-listing is the result of a large backlog (500+ images) of uncategorised images which the bot repeatedly tries, but fails, to categorise. The backlog is normally the result of one or more of the following reasons.

  1. The root category contains many images without a tracker template. Without a tracker template the bot cannot determine which monument an image depicts hence it cannot categorize it.
    • To fix this the images need to be tagged with the Template and the id corresponding to the monument. Alternatively all such images can be moved to a subcategory for "unknown monuments". <To find all such images... maybe Petscan or ImagesWithoutIdPage>
  2. The root category contains many images with a tracker template but with an invalid or empty id. This prevents the bot from determine which monument an image depicts hence it cannot categorize it. Any id not present in the lists is considered invalid.
    • To fix this images need to have their ids corrected. This often requires looking at the image description or contacting the uploader. It may also be worth looking over your competition instructions to make sure uploaders know which ids to use.
    • Some of these images can be found through the appropriate subcategory of Category:Cultural heritage monuments with wrong ID.
    • For the more complex cases <something about a way of finding which these are>.
    • Also ensure your lists are up to date so that there are no missing monuments getting flagged.
  3. <per monument categories, commonscat in list entries or wikidata - missing commonscat report>
  4. <regional sublists, commonscat in list page or its category>

.....

Once some of these causes have been identified and rectified you can contact the international team who can perform a one-off categorisation run for your country to re-evaluate if it can be taken off the skip-list.

Event Timeline

Some quick brainstorming of what causes countries to end up here.

  • No one in the organising team knows Commons well enough to care about categorisation.
  • The root category contains a lot of images without the tracking template (but the country uses one)
  • There are no sub-categories for regions (or the list pages/their Wikidata equivalents don't make use of these for commonscat)
  • No one is active with creating categories for individual monuments (or these categories are not used in the commonscat field of the list)

Am I missing anything?

For the third point activating and using the "missing commonscat report " can help find any categories tagged with the tracker template.

What would help the third point most though is T56152: Make a tool to suggest categories to create for monuments at Commons. Even a fairly simple tool listing the top 25 monuments (by image count) without a commonscat and listing these would allow people to start making categories.

I think the reasons should be a bit more constructive :)

  • The categorization structure does not seem set up (link to documentation)
  • The root category contains a lot of images without a tracking template <not sure what that means>
  • There are no sub-categories for regions (link to documentation how to create them)
  • The Wikidata equivalents don't make use of the subcategories for commonscat (link to documentation how to fix that)
  • There is no detected activity in improving categorization for XX days. (link to page where bot can be restarted)

I think reasons should always clarify how to fix it. The links can be omitted at first, but should ideally exist.

I think the reasons should be a bit more constructive :)

Definitely true. At this point I was just brainstorming how countries end up in this situation.

The root category contains a lot of images without a tracking template <not sure what that means>

The tracking template is the little template on Commons which contains the if and often generates a link to the official webpage.

Some countries simply don't use one (e.g. Ireland), some have them but the upload campaign doesn't make use of it (i.e. doesn't add it).

I think reasons should always clarify how to fix it. The links can be omitted at first, but should ideally exist.

I agree. There will be a few cases where there is no obvious fix though.

Another reason is that a lot of images have the tracking template but with invalid ids (i.e. ids not in the lists, often added by contributors who just enter something to be allowed to upload).

  • For these we generate lists which volunteers can use to clear the backlog

Worth keeping in mind is that it is possible to start a one-off categorisation run for a country to re-evaluate if they can be taken of the blacklist.

Started drafting some entries. Have not integrated @Effeietsanders suggestions yet.

Straight away a couple of missing tools stand out.

  • images with bad id's - a report of all images with unrecognized id's. I've donee something similar for WLE in Sweden, unclear if out scales. We could limit it to images in the root category.
  • frequent image id's without commonscat - a report of any image id's used by 5+ images (in the root category) where that id does not have an associated commonscat.
  • images in the root category without a tracking template. This might be doable with petscan. If so we should make such a query easy to set up.

Thanks @Lokal_Profil .

As for bad id's, isn't this already covered by https://commons.wikimedia.org/wiki/Category:Cultural_heritage_monuments_with_wrong_ID ? Although that seems to be using template logic rather than actual checks. Unrecognized Id's would be step one. Perhaps an additional run for 'suspicious id's' may be useful, too - when the coordinates are off too much, etc?

Yes that category should definitely be the first stop. As you say it remotes on template logic and I think moody templates only implement this as empty id=wrong I'd.

Was this fixed in Sept, or should this still be added to documentation? (not sure any longer)

This still needs to be added to the documentation. The drafting text tells you why a country might be blacklisted but the draft above tries to explain what you can do about it.

This open task is tagged with Wiki-Loves-Monuments 2018 which was two years ago. If this task was/is resolved, then please update the task status. If this task was not resolved but is still valid, then please update the project tags to include at least one active project tag, so this task could be found when looking at that other project. (Without reaction, this task might get declined at some point.) Thanks a lot!

Removing Wiki-Loves-Monuments 2018 tag as that was two years ago; adding general Wiki-Loves-Monuments tag.

Ciell renamed this task from Describe how to get off the categorisation blacklist to Documentation: describe how to get off the categorisation blacklist.Feb 18 2022, 7:44 PM
Ciell moved this task from Incoming to Backlog on the Wiki-Loves-Monuments board.
Lokal_Profil renamed this task from Documentation: describe how to get off the categorisation blacklist to Documentation: describe how to get off the categorisation skip-list.Feb 28 2022, 2:44 PM
Lokal_Profil updated the task description. (Show Details)
Ciell renamed this task from Documentation: describe how to get off the categorisation skip-list to [2018] Documentation: describe how to get off the categorisation skip-list.Aug 26 2025, 11:12 AM