import pyarrow hdfs = pyarrow.hdfs.connect()
yields
ArrowIOError: Unable to load libjvm
The arrow interface to hdfs is very useful for interacting with hdfs when using spark. In my case, I am trying to parse json files emitted by dump2reverts.py.
import pyarrow hdfs = pyarrow.hdfs.connect()
yields
ArrowIOError: Unable to load libjvm
The arrow interface to hdfs is very useful for interacting with hdfs when using spark. In my case, I am trying to parse json files emitted by dump2reverts.py.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ottomata | T202812 Install pyArrow in Cluster | |||
Resolved | Ottomata | T225692 Pyarrow hdfs interface does not work in SWAP |
Luca is right!
Here's a workaround until we (eventually) upgrade:
import os import pyarrow os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64' hdfs = pyarrow.hdfs.connect()