The folks at RStudio recently unveiled their new package sparklyr that allows data scientists to use dplyr verbs on remote data via Spark. It also exports the ML algorithms bundled with Spark and makes them available as functions in R. See Jeff Allen's talk at useR! 2016 Stanford: https://github.com/trestletech/user2016-sparklyr/blob/master/sparklyr-user2016.pdf
We've long had an interest in using Spark with R, and this seems to finally be solution we were waiting for. Let's see if we can get it to work!
P.S. Once PAWS Internal has Spark support, it would be great to be able to use this in Jupyter notebooks too.