Currently this notebook uses a PySpark kernel to initialise a SparkSession and pre-populate the global notebook state with a spark object. This approach is being deprecated in favour of manual SparkSession initialisation (e.g. via wmfdata).
We should update and future proof the notebook.
- Acceptance criteria
- The Jupyter kernel is set as Python
- SparkSession is manually initialised in a notebook cell.
- Notes
- We already do this in algorithm.ipynb, which can serve as an example.