Page MenuHomePhabricator

Wikidata json dumps filling /var/log
Closed, ResolvedPublic

Description

I had to shoot the jobs, they were filling /

-rw-rw-r-- 1 datasets datasets 10567705521 Nov 20 09:02 dumpwikidatajson-wikidata-20171120-all-0.log
-rw-rw-r-- 1 datasets datasets 10654791963 Nov 20 09:02 dumpwikidatajson-wikidata-20171120-all-1.log
-rw-rw-r-- 1 datasets datasets 10730245707 Nov 20 09:02 dumpwikidatajson-wikidata-20171120-all-2.log
-rw-rw-r-- 1 datasets datasets 10736461213 Nov 20 09:02 dumpwikidatajson-wikidata-20171120-all-3.log
-rw-rw-r-- 1 datasets datasets 10673102328 Nov 20 09:02 dumpwikidatajson-wikidata-20171120-all-4.log
-rw-rw-r-- 1 datasets datasets 10660331545 Nov 20 09:02 dumpwikidatajson-wikidata-20171120-all-5.log

Event Timeline

I had to remove them to reclaim space, but I preserved one on snapshot1007 in /mnt/data/logs/dumpwikidatajson-wikidata-20171120-all-1.log and it's full of json output. Adding @hoo because he's working on these.

The command that wrote the above log was this:

php5 /srv/mediawiki/multiversion/MWScript.php extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpJson.php --wiki wikidatawiki --shard 1 --sharding-factor 6 --batch-size 1000 --snippet 2 --no-cache

Contents of the output files in /mnt/data/xmldatadumps/temp:

-rw-rw-r-- 1 datasets datasets    20 Nov 20 03:15 wikidataJson.3.gz
-rw-rw-r-- 1 datasets datasets    20 Nov 20 03:15 wikidataJson.5.gz
-rw-rw-r-- 1 datasets datasets    20 Nov 20 03:15 wikidataJson.2.gz
-rw-rw-r-- 1 datasets datasets    20 Nov 20 03:15 wikidataJson.4.gz
-rw-rw-r-- 1 datasets datasets    20 Nov 20 03:15 wikidataJson.1.gz
-rw-rw-r-- 1 datasets datasets    20 Nov 20 03:15 wikidataJson.0.gz

Change 392398 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] fix missing redirect for wikidata json dumps

https://gerrit.wikimedia.org/r/392398

Change 392398 merged by ArielGlenn:
[operations/puppet@production] fix missing redirect for wikidata json dumps

https://gerrit.wikimedia.org/r/392398

@hoo if you don't see anything else amiss, feel free to start the job manually at any time. I'll clean up that temp log file I left laying around then, too.

Mentioned in SAL (#wikimedia-operations) [2017-11-20T10:23:19Z] <hoo> Manually re-started the Wikidata entity JSON dump on snapshot1007 (T180934)

hoo claimed this task.
hoo removed a project: Patch-For-Review.

Both scripts look fine again and the dumpers are running… sorry for the mess :/

Mentioned in SAL (#wikimedia-operations) [2017-11-22T01:09:00Z] <hoo> Cleaned out remaining T180934 related log blow up on snapshot1007 (dumpwikidatajson-wikidata-20171120-all-0.log)