Page MenuHomePhabricator

[SPIKE] Consider other Wikidata properties to gather additional image QIDs
Closed, ResolvedPublic


When developing the T299781: [EPIC] Image suggestions backend , we collected image Wikidata QIDs from two properties, i.e., P18 and P373. The dataset lives in the image_suggestions_wikidata_data Hive table.

We may gather additional image QIDs via relevant Wikidata properties that have an image range.
This spike is to understand pros and cons of properties that are more generic than P18.

NOTE: We already know that P373 may lead to some noise. For instance, Colporteur has tramp as an image QID due to Bosch's painting The Wayfarer being in Category:Tramps.


We sampled 200 random topics and queried Wikidata for all properties that expect a Commons media file (includes files other than images). Result:

topicssection scoretopics with valuestotal media property valuesgain VS p18
200> 1096146+58




We already leverage p18 and other properties don't seem worth the effort.

Event Timeline

Leaving in the backlog until we see the results of T316151; if not needed, we will close.

CBogen added a subscriber: AUgolnikova-WMF.

Closing; based on the update above it seems we won't move forward with this. @mfossati or @AUgolnikova-WMF, feel free to reopen if you disagree or if there's more investigation to do here.

Pasting here the full SPARQL query URL for reference, it can't be shortened as it's too long 😆 :