Page MenuHomePhabricator

Using Munge script in standalone WDQS: Fatal error munging RDF: RDFParseException: Illegal language tag char: ':'
Closed, InvalidPublic

Description

I am trying to run a local version of the wikidata query service.

Following along with these instructions: https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone_service, I downloaded version 0.2.4 of the service from Maven central, and then downloaded the 20171009 entities file called "wikidata-20171009-all-BETA.ttl.gz" from this directory: https://dumps.wikimedia.org/wikidatawiki/entities/20171009/.

I then tried to run the munge.sh script on the ttl.gz file:
./munge.sh -f wiki_dumps/20171009/wikidata-20171009-all-BETA.ttl.gz -d wiki_dumps/20171009/split -l en -s

This step seemed to work for the first 1760000 entities, in steps of 10000, but then failed giving me the following error:

14:27:09.118 [main] ERROR org.wikidata.query.rdf.tool.Munge - Fatal error munging RDF

java.lang.RuntimeException: org.openrdf.rio.RDFParseException: Illegal language tag char: ':' [line 269428203]

Event Timeline

Aklapper renamed this task from Issue using Munge script in standalone wikidata service to Using Munge script in standalone WDQS: Fatal error munging RDF: RDFParseException: Illegal language tag char: ':'.Oct 14 2017, 10:34 AM
Smalyshev added a project: User-Smalyshev.
Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.

@Caliuser could you per chance quote the whole exception?
Also, if you could get me the last file that has been created by the tool, that would be very helpful.

I've just ran munger on the same file on labs, and it finished fine. Since I'm unable to reproduce and don't have any way to investigate without additional data, closing for now.

I wonder if it is due to the fact that I am using the WDQS v0.2.4 from here: http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%22%20AND%20a%3A%22service%22

I originally tried to install the service from the Github repo, since I figured that would be the most up-to-date: https://github.com/wikimedia/wikidata-query-rdf/
but the build step ("mvn package") failed for me.

I have since been able to run the munger on an earlier copy of the dump: 20170807/wikidata-20170807-all-BETA.ttl.gz

Here is the full exception from the munger on the 20171009 file:

10:22:18.134 [main] INFO org.wikidata.query.rdf.tool.Munge - Processed 1760000 entities at (1671, 1335, 1003)
10:22:21.653 [main] ERROR org.wikidata.query.rdf.tool.Munge - Fatal error munging RDF
java.lang.RuntimeException: org.openrdf.rio.RDFParseException: Illegal language tag char: ':' [line 269428203]
at org.wikidata.query.rdf.tool.Munge.run(Munge.java:219) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.wikidata.query.rdf.tool.Munge.main(Munge.java:133) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
Caused by: org.openrdf.rio.RDFParseException: Illegal language tag char: ':' [line 269428203]
at org.openrdf.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:347) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.helpers.RDFParserBase.reportError(RDFParserBase.java:641) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.reportError(TurtleParser.java:1394) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parseQuotedLiteral(TurtleParser.java:706) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parseValue(TurtleParser.java:651) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:527) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:458) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:446) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:409) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:259) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.wikidata.query.rdf.tool.Munge$ForbiddenOk$HackedTurtleParser.parseStatement(Munge.java:674) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:214) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
at org.wikidata.query.rdf.tool.Munge.run(Munge.java:217) ~[wikidata-query-tools-0.2.4-jar-with-dependencies.jar:na]
... 1 common frames omitted

Could you try version 0.2.5 and see if it works better?

I retried the download which took another 16h but the problem persists