Page MenuHomePhabricator

📊The dataset_metrics notebook should run as a python kernel
Closed, ResolvedPublic

Description

Currently this notebook uses a PySpark kernel to initialise a SparkSession and pre-populate the global notebook state with a spark object. This approach is being deprecated in favour of manual SparkSession initialisation (e.g. via wmfdata).

We should update and future proof the notebook.

  1. Acceptance criteria
  2. The Jupyter kernel is set as Python
  3. SparkSession is manually initialised in a notebook cell.
  1. Notes
  2. We already do this in algorithm.ipynb, which can serve as an example.