This
https://github.com/kjschiroo/measuring-edit-productivity/blob/master/hadoop/revdocs2diffs.hadoop
needs to be run inside a virtualenv
We needed to cp -r Local-Python-3.4.1/lib/python3.4/* venv/3.4/lib/python3.4/ to get it working on hadoop. There is probably more that needs to happen.