Page MenuHomePhabricator

cron job to create RDF dumps
Closed, ResolvedPublic

Description

Have a cron job to create RDF dumps enabled.

Which RDF formats?

Event Timeline

JanZerebecki raised the priority of this task from to Needs Triage.
JanZerebecki updated the task description. (Show Details)
JanZerebecki added a subscriber: JanZerebecki.

TTL should be enough for now. Right now command line would be something like:

mwscript extensions/WikidataBuildResources/extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --format ttl --output {filename}

but that could change (see T93488 and https://gerrit.wikimedia.org/r/#/c/198803/). So there might be several parts in the future. Right now there's just one and ttl should be enough.

in production, it would just be:

mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --format ttl --output {filename}

instead of using "WikidataBuildResources"

Yes, whatever it is for dumpJson, just replace it with dumpRdf and add options :)

Lydia_Pintscher moved this task from incoming to ready to go on the Wikidata board.

Ready to run manual command to create a dump:

filename=`date +'%Y%m%d'`; i=0; shards=4; while [ $i -lt $shards ]; do php /srv/mediawiki/multiversion/MWScript.php extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --format ttl --shard $i --sharding-factor $shards 2>> /var/log/wikidatadump/dumpwikidatardf-$filename-$i.log | gzip > /mnt/data/xmldatadumps/temp/wikidataRdf-$filename.$i.gz & let i++; done

Change 201003 had a related patch set uploaded (by Smalyshev):
T93658: create script for TTL dumps

https://gerrit.wikimedia.org/r/201003

Change 201003 had a related patch set uploaded (by Hoo man):
Add a script to create Wikidata ttl dumps

https://gerrit.wikimedia.org/r/201003

Change 201003 merged by ArielGlenn:
Add a script to create Wikidata ttl dumps

https://gerrit.wikimedia.org/r/201003

The cron is in place (but disabled for now). Once the code is deployed and I tested it manually for a bit, we can enable the automatic dump creation. If everything works out, this will be late next week (maybe Wednesday, Thursday).