After https://gerrit.wikimedia.org/r/#/c/327862/ is merged, we need to setup regular dumps for categories into RDF. We should have a list of wikis which are dumped, probably in mediawiki configs list.
|Stalled||None||T57644 Eliminate overcategorization when moving images from the root category|
|Open||None||T113847 ErfgoedBot should not add a category if it is a subcategory of a category already there|
|Open||None||T110833 Provide service to filter over categorization from a list of Commons categories|
|Resolved||Smalyshev||T173980 Include hidden status in category RDF|
|Resolved||Smalyshev||T174071 [Q1 2017-18 Objective] Expand category search via WDQS|
|Resolved||Smalyshev||T181549 [epic] Subcategory searching|
|Resolved||Smalyshev||T165982 Investigate using blazegraph for deep category searching / returning of results|
|Resolved||Gehel||T157676 Provide access to category information from WDQS SPARQL|
|Resolved||ArielGlenn||T173892 Setup dump for categories RDF representation|
Once that's sorted, would you be ok with me merging this whenever?
Also, any estimate on how long the job would take to run across all wikis?
Hmm I can't find timings now, I remember enwiki being done in terms of hours, but I can't locate where I recorded it. I'll retest and add it. Most wikis have less categories than enwiki or commons, so will probably be much faster. I'll add the figures in a bit.
I've merged and deployed this, after making a few (mostly cosmetic) changes. I'm going to update the script now so that it uses te clean new way of getting config settings, and I'll leave tis ticket open until cron runs once successfully.