Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Paste
P11026
import wikidata ttl dumps to hdfs
Active
Public
Actions
Authored by
dcausse
on Apr 20 2020, 5:57 PM.
Edit Paste
Archive Paste
View Raw File
Subscribe
Mute Notifications
Award Token
Flag For Later
Tags
None
Referenced Files
F31769079: raw.txt
Apr 20 2020, 5:57 PM
2020-04-20 17:57:00 (UTC+0)
Subscribers
None
spark2-submit --class org.wikidata.query.rdf.spark.WikidataTurtleDumpConverter --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 2G --conf spark.dynamicAllocation.maxExecutors=64 rdf-spark-tools-0.3.14-SNAPSHOT-jar-with-dependencies.jar --input-path hdfs://analytics-hadoop/wmf/data/raw/wikidata/dumps/all_ttl/20200302/wikidata-20200302-all-BETA.ttl.bz2 --output-path hdfs://analytics-hadoop/user/dcausse/wikidata_graph_20200302/ --output-format parquet
Event Timeline
dcausse
created this paste.
Apr 20 2020, 5:57 PM
2020-04-20 17:57:00 (UTC+0)
Log In to Comment