Page MenuHomePhabricator

cron job to create RDF dumps
Closed, ResolvedPublic

Description

Have a cron job to create RDF dumps enabled.

Which RDF formats?

Event Timeline

JanZerebecki raised the priority of this task from to Needs Triage.
JanZerebecki updated the task description. (Show Details)
JanZerebecki added a subscriber: JanZerebecki.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 23 2015, 8:34 PM
Smalyshev added a comment.EditedMar 23 2015, 8:45 PM

TTL should be enough for now. Right now command line would be something like:

mwscript extensions/WikidataBuildResources/extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --format ttl --output {filename}

but that could change (see T93488 and https://gerrit.wikimedia.org/r/#/c/198803/). So there might be several parts in the future. Right now there's just one and ttl should be enough.

aude added a subscriber: aude.Mar 23 2015, 8:59 PM

in production, it would just be:

mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --format ttl --output {filename}

instead of using "WikidataBuildResources"

Yes, whatever it is for dumpJson, just replace it with dumpRdf and add options :)

JanZerebecki assigned this task to hoo.Mar 24 2015, 1:22 PM
Lydia_Pintscher triaged this task as High priority.Mar 25 2015, 9:44 AM
Lydia_Pintscher moved this task from incoming to ready to go on the Wikidata board.

We need a variant of modules/snapshot/files/dumpwikidatajson.sh for this, which does the sharding and assembling the shards.
https://git.wikimedia.org/blob/operations%2Fpuppet.git/6abb1fd3a7b9e9baa0155732d0e96999ff925527/modules%2Fsnapshot%2Ffiles%2Fdumpwikidatajson.sh

JanZerebecki added a comment.EditedMar 30 2015, 6:36 PM

Ready to run manual command to create a dump:

filename=`date +'%Y%m%d'`; i=0; shards=4; while [ $i -lt $shards ]; do php /srv/mediawiki/multiversion/MWScript.php extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --format ttl --shard $i --sharding-factor $shards 2>> /var/log/wikidatadump/dumpwikidatardf-$filename-$i.log | gzip > /mnt/data/xmldatadumps/temp/wikidataRdf-$filename.$i.gz & let i++; done
Joe added a subscriber: Joe.Mar 30 2015, 6:40 PM

Change 201003 had a related patch set uploaded (by Smalyshev):
T93658: create script for TTL dumps

https://gerrit.wikimedia.org/r/201003

Change 201003 had a related patch set uploaded (by Hoo man):
Add a script to create Wikidata ttl dumps

https://gerrit.wikimedia.org/r/201003

Change 201003 merged by ArielGlenn:
Add a script to create Wikidata ttl dumps

https://gerrit.wikimedia.org/r/201003

hoo closed this task as Resolved.Apr 2 2015, 11:52 AM

The cron is in place (but disabled for now). Once the code is deployed and I tested it manually for a bit, we can enable the automatic dump creation. If everything works out, this will be late next week (maybe Wednesday, Thursday).

hoo moved this task from Backlog to Done on the § Wikidata-Sprint-2015-03-24 board.