Page MenuHomePhabricator

Pyarrow hdfs interface does not work in SWAP
Closed, ResolvedPublic1 Estimated Story Points

Description

import pyarrow
hdfs = pyarrow.hdfs.connect()

yields

ArrowIOError: Unable to load libjvm

The arrow interface to hdfs is very useful for interacting with hdfs when using spark. In my case, I am trying to parse json files emitted by dump2reverts.py.

Related Objects

Event Timeline

Luca is right!

Here's a workaround until we (eventually) upgrade:

import os
import pyarrow
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64'
hdfs = pyarrow.hdfs.connect()
Ottomata moved this task from Next Up to Done on the Analytics-Kanban board.
Ottomata moved this task from Incoming to Operational Excellence on the Analytics board.
Nuria set the point value for this task to 1.