Cloudera's hadoop-hdfs-fuse package has always been a pain. It was never reliable for writes, so we mount HDFS read only. But even the read only mount breaks often and requires manual intervention.
However, I think we still need an HDFS FS mount. Ideally we could find a newer solution that works better for both reads and (small) writes. There seem to be a lot out there!
In T224658: Newpyter - SWAP Juypter Rewrite I'm exploring to see if we can create a fully thin Jupyterhub solution: one that runs both user notebook servers as well as all kernels on a Hadoop worker or in Yarn. To really do this and make it useable for users, they need to be able to write files from a notebook or kernel process running on any worker node, and later also access them from another worker node. The only way I can think to accomplish this is with a shared filesystem of some kind, and I'd much prefer if we could leverage HDFS for this.
Let's explore the HDFS-FS bridges out there and see if we can find one that will fit our needs.