Page MenuHomePhabricator

Can't munge most recent dump
Closed, ResolvedPublic

Description

manybubbles@manybubbles-laptop:~/Downloads/tmp/service-0.0.3-SNAPSHOT$ ./munge.sh -f ../../wikidata-20150622-all-BETA.ttl.gz -d data -l en,ru,de,es -s
11:35:11.362 [main] INFO org.wikidata.query.rdf.tool.Munge - Switching to data/wikidump-000000001.ttl.gz

Stakeholders: Everyone who wants to run this tool
Benefits: You can't start a new instance of the service because you can't load the data
Estimate: Just guessing, but probably less than half a day.

Event Timeline

Manybubbles raised the priority of this task from to Unbreak Now!.
Manybubbles updated the task description. (Show Details)
Manybubbles subscribed.

Change 221724 had a related patch set uploaded (by Manybubbles):
Recognize https for entitydata

https://gerrit.wikimedia.org/r/221724

Change 221724 merged by jenkins-bot:
Recognize https for entitydata

https://gerrit.wikimedia.org/r/221724

nopping subscribed.

Same issue again on the latest dump wikidata-20160509-all-BETA.ttl.bz2

@nopping could you describe in a bit more detail, what exactly is the issue - whar exactly are you running, what is the error message, etc.?

The output file that is supposed to be generated (data/wikidump-000000001.ttl.gz) remains empty for several hours (I continuously probe the file size of the data/ directory but I see no size increase at all). I don't know if the tool is stuck or if I should wait additional time for the script to spit out something.

This looks like a different issue, could you submit a new bug and describe full command line, java version and all other relevant info?