Page MenuHomePhabricator

[L] Create script to add existing images on Commons from specific categories to the popular CAT queue
Open, Needs TriagePublic

Description

As a creator of campaigns, I want to be able to direct users to particular categories of images to tag with depicts statements using the ISA tool/CAT, so that I can increase the amount of structured data on images in a targeted area.

This task is to create a script that can be updated to point to any particular category on Commons and run through older images in that category to add them to the "popular" CAT queue. The first use case is for Category:Files_from_content_partnerships from the creators of the ISA tool:

As we are planning to work on content that has been uploaded by GLAM institutions: Would it be possible to include all the files in the following category (including sub-categories) to the maintenance script that triggers the generation of “depicted” suggestions? Category:Files_from_content_partnerships

Acceptance Criteria:

  • A script is created that can be run on any category when needed to add all images from that category to the CAT "popular" queue
  • The script is run on Category:Files_from_content_partnerships
  • The script does not prioritize images in that category (e.g., uncategorized images will still maintain priority in the CAT queue as per T262857)

Event Timeline

Our main need consists in being able to activate Google Vision on the images in specific categories in order to use them in an enhanced version of the ISA tool that includes tag suggestions from Google Vision (which is currently not the case for older uploads).

If this implies adding these images to the "popular" CAT queue that's ok; but it's not an important requirement in the context of our use case.

CBogen renamed this task from Create script to add existing images on Commons from specific categories to the popular CAT queue to [L] Create script to add existing images on Commons from specific categories to the popular CAT queue.Mar 24 2021, 4:41 PM

I have concerns about this approach. Structured Data on Commons is meant to be, well, "structured". It is not for "tags".

For example, a photograph of the White House in Washington DC might be tagged "white" and "building", but in terms of structured data it depicts Q35525, which is the item about that single specific building. The item tells us that the subject is a building, and that it is white in colour.

I have concerns about this approach. Structured Data on Commons is meant to be, well, "structured". It is not for "tags".

For example, a photograph of the White House in Washington DC might be tagged "white" and "building", but in terms of structured data it depicts Q35525, which is the item about that single specific building. The item tells us that the subject is a building, and that it is white in colour.

We have already done some tests with Google Vision on the ISA tool. The goal is indeed to add "depicts" statements. So far, color tags are not an issue; they don't seem to come up in the suggestions. However, suggestions such as "photograph" and the like can be problematic as they may apply, e.g. when a photo depicts photographs. In many cases, however, where the digital image represents a photograph the statement shouldn't be applied. The same goes for scans of postcards: At what level do we apply the "depicts" statement? - At the level of the scan that depicts a photograph with an image and often a frame and some text? Or just at the level of the image on the postcard? - These are issues that need to be addressed via community deliberations. We expect that the development and deployment of tools to assist with adding "depicts" statements will foster such deliberations, whose results can then be reflected in the tools themselves in order to nudge users in the direction of established shared practices.

These issues have been addressed via community deliberations; depicts statement should be at the most specific level possible.

Current policy is at:

https://commons.wikimedia.org/wiki/Commons:Depicts

and includes, (emboldening in original) ...generic "tags" should not currently be added if more specific depicts statements already exist.

There is prior discussion, for example, here:

https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2020/02#Misplaced_invitation_to_%22tag%22_images

After some poking around I see that step 1 can be accomplished using the scripts

  • maintenance/createFileListfromCategoriesAndTemplates.php
  • ... and then maintenance/fetchSuggestions.php

Hmm or maybe not. maintenance/createFileListfromCategoriesAndTemplates.php doesn't handle sub-categories

Change 701432 had a related patch set uploaded (by Cparle; author: Cparle):

[mediawiki/extensions/MachineVision@master] Add options to allow job createFileList job to use subcategories

https://gerrit.wikimedia.org/r/701432

Change 701432 merged by jenkins-bot:

[mediawiki/extensions/MachineVision@master] Add options to allow job createFileList job to use subcategories

https://gerrit.wikimedia.org/r/701432

Patch should be on production tomorrow, so can run the script then

Update: the script ran on production for ~24hrs, then failed. I suspect there might be an infinite loop in the code someplace, will have to dig into it again