After https://gerrit.wikimedia.org/r/#/c/327862/ is merged, we need to setup regular dumps for categories into RDF. We should have a list of wikis which are dumped, probably in mediawiki configs list.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Stalled | None | T57644 Eliminate overcategorization when moving images from the root category | |||
Open | None | T113847 ErfgoedBot should not add a category if it is a subcategory of a category already there | |||
Open | None | T110833 Provide service to filter over categorization from a list of Commons categories | |||
Resolved | Smalyshev | T173980 Include hidden status in category RDF | |||
Resolved | Smalyshev | T174071 [Q1 2017-18 Objective] Expand category search via WDQS | |||
Resolved | Smalyshev | T181549 [epic] Subcategory searching | |||
Resolved | Smalyshev | T165982 Investigate using blazegraph for deep category searching / returning of results | |||
Resolved | Gehel | T157676 Provide access to category information from WDQS SPARQL | |||
Resolved | ArielGlenn | T173892 Setup dump for categories RDF representation |
Event Timeline
Proposed lists of wikis:
Initial:
- testwiki
- test2wiki
Then:
- enwiki
- dewiki
- commonswiki
After those work, we can ask people for enabling it on more wikis or just enable it on all wikis.
Change 373167 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/mediawiki-config@master] Add list for wikis that would have categories dumped into RDF
Change 373354 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Add RDF dumps for categories
Change 373167 merged by jenkins-bot:
[operations/mediawiki-config@master] Add list for wikis that would have categories dumped into RDF
Mentioned in SAL (#wikimedia-operations) [2017-08-28T23:07:04Z] <ebernhardson@tin> Synchronized dblists/categories-rdf.dblist: T173892: Add list for wikis that would have categories dumped into RDF (duration: 00m 43s)
Mentioned in SAL (#wikimedia-operations) [2017-08-28T23:08:34Z] <ebernhardson@tin> Synchronized docroot/noc/conf/categories-rdf.dblist: T173892: Add list for wikis that would have categories dumped into RDF (duration: 00m 43s)
@ArielGlenn could you take a look and see if this (https://gerrit.wikimedia.org/r/373354) makes sense?
Just a couple nits, see gerrit. Once that's sorted, would you be ok with me merging this whenever? Also, any estimate on how long the job would take to run across all wikis? Thanks!
Once that's sorted, would you be ok with me merging this whenever?
Yes, please!
Also, any estimate on how long the job would take to run across all wikis?
Hmm I can't find timings now, I remember enwiki being done in terms of hours, but I can't locate where I recorded it. I'll retest and add it. Most wikis have less categories than enwiki or commons, so will probably be much faster. I'll add the figures in a bit.
Change 373354 merged by ArielGlenn:
[operations/puppet@production] Add RDF dumps for categories
I've merged and deployed this, after making a few (mostly cosmetic) changes. I'm going to update the script now so that it uses te clean new way of getting config settings, and I'll leave tis ticket open until cron runs once successfully.
@ArielGlenn thank you, will wait for first dump to happen. If that works fine, I'll enable it for more wikis. The timing for enwiki is:
real 40m49.040s user 29m37.468s sys 0m9.160s
This seems to be pretty reasonable. Dump size for enwiki is ~50M (gzipped).
Change 377369 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Add categories RDF dump into the index page
Change 377369 merged by ArielGlenn:
[operations/puppet@production] Add categories RDF dump into the index page