Page MenuHomePhabricator

(better) tool or workflow to create missing painters on Wikidata
Open, Needs TriagePublic

Description

As a member of the Sum of all Paintings project I want to easily create items for missing painters. Currently I have a tool for that, but it's clunky and will break soon because of some database schema changes. This task (story) describes what a new better tool should be doing.

Currently I import paintings per collection to Wikidata. If the collection is quite different from what we already have, it will mean it contains paintings by painters for which painters we don't have an item yet. If the painters do exist, a robot will add the link. The (numerous) cases where the bot wasn't able to link a painting with it's creator are grouped by collection at https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Top_collections_missing_creator . This also contains a link to the current tool I'm using. For example for Nelson-Atkins https://tools.wmflabs.org/multichill/painters/index.php?collection=Q1976985 is the output.

The current tool queries the wikidata database for paintings in a certain collection (this case Nelson-Atkins/Q1976985) that don't have a creator (P170) statement yet, but do have a description in English of the form "painting by <some person>". This is sorted by number of hits. For each hit the mix'n'match database is queried for unlinked catalog items that have the same label. It contains a easy search link to check if a person doesn't exist already and a link to pre-filled (old) QuickStatements to create the item. A user has to manually check each catalog suggestion and remove the lines for the incorrect suggestions. I've created hundreds, maybe even thousands of painters this way.

It works, but it will stop working soon. The user interface looks bad and it's not very user friendly. A new version should be created. My focus area is missing painters, but if done well it could probably be used to create missing items based on mix'n'match in other domains.

First step is to be able to search. Either just give me everything (current output of https://tools.wmflabs.org/multichill/painters/index.php) or by collection (current output of https://tools.wmflabs.org/multichill/painters/index.php?collection=Q1976985). This should switch to either SPARQL or search engine based so that it isn't affected by the SQL schema change.
For each entry found it should query the mix'n'match database for relevant catalog entries. It would probably make sense to make a small api service for this that we can query from the client. If we're combining data like this client side, it would make sense to also search on Wikidata for possible candidates or confirmation that no person with that name is found on Wikidata.

If the user found an already existing item about the painter, the user should be able to easily add the relevant catalog statements. Something like selecting the existing item, ticking some boxes and submitting.
If the user didn't find an existing item, the user should have the option to create one. Something like a pre-filed version of https://tools.wmflabs.org/wikidata-todo/cradle/#/ . The user should have an option to tick/untick specific catalog statements so only the correct onces are added.

Actually linking up the paintings with the (newly created) painter item is out of scope for this task, but could added too.

Event Timeline

Magnus told me about this tool, which gathers painting items with same creator in "painted by" description. You need to fill in the Q number of the artist which you can create or lookup. https://tools.wmflabs.org/mix-n-match/painters.php