Page MenuHomePhabricator

Allow CHUNK value to be passed in as an option for munge.sh
Closed, ResolvedPublic

Description

When importing munged data into a fresh query service some timeouts happen:

Processing wikidump-000000274.ttl.gz
SPARQL-UPDATE: updateStr=LOAD <file:///mnt/disks/ssddata/mungeOut/wikidump-000000274.ttl.gz>
java.util.concurrent.TimeoutException

Reducing the chunk size of the munge step seems to resolve this.

Right now the chunk size is not customizable without changing the file.
It would be great to be able to pass the chunk size in as an option.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 10 2019, 11:19 PM
Smalyshev triaged this task as Normal priority.Aug 15 2019, 6:23 AM

This should not be hard to do.

Change 530634 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Allow specifying chunk size in the script

https://gerrit.wikimedia.org/r/530634

Change 530634 merged by jenkins-bot:
[wikidata/query/rdf@master] Allow specifying chunk size in the script

https://gerrit.wikimedia.org/r/530634

Amazing, would it be possible to get this bakcported to 0.3.1?
Or should I just back it into the docker images ? :)

@Addshore I was planning to do 0.3.2 pretty soon, if that is easier for you you could use that.

Smalyshev closed this task as Resolved.Aug 26 2019, 6:35 AM

@Addshore I was planning to do 0.3.2 pretty soon, if that is easier for you you could use that.

Sounds great to me/ I'll keep an eye out.

@Addshore 0.3.2 should be up already.