Page MenuHomePhabricator
Paste P11026

import wikidata ttl dumps to hdfs
ActivePublic

Authored by dcausse on Apr 20 2020, 5:57 PM.
Tags
None
Referenced Files
F31769079: raw.txt
Apr 20 2020, 5:57 PM
Subscribers
None
spark2-submit --class org.wikidata.query.rdf.spark.WikidataTurtleDumpConverter --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 2G --conf spark.dynamicAllocation.maxExecutors=64 rdf-spark-tools-0.3.14-SNAPSHOT-jar-with-dependencies.jar --input-path hdfs://analytics-hadoop/wmf/data/raw/wikidata/dumps/all_ttl/20200302/wikidata-20200302-all-BETA.ttl.bz2 --output-path hdfs://analytics-hadoop/user/dcausse/wikidata_graph_20200302/ --output-format parquet