Page MenuHomePhabricator

nbconvert outputs notebooks that stay
Closed, ResolvedPublicBUG REPORT

Description

$ ls /srv/product_analytics/jobs/movement_metrics/notebooks/*.nbconvert.* # on stat1007
reading_01_pageviews.nbconvert.ipynb
reading_02_global_markets.nbconvert.ipynb
test.nbconvert.ipynb
test.nbconvert.nbconvert.ipynb
test.nbconvert.nbconvert.nbconvert.ipynb
test.nbconvert.nbconvert.nbconvert.nbconvert.ipynb
test.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.ipynb
test.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.ipynb
test.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.ipynb
test.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.nbconvert.ipynb

We get those because of the way we execute notebooks in main.sh:

python -m jupyter nbconvert --ExecutePreprocessor.timeout=None --to notebook --execute $notebook || exit 1

However, we do not want to use the --inplace flag because then we could end up with merge conflicts on the notebooks. (Not sure if the git::clone module in Puppet resets the clone when it's set to ensure => 'latest') so the simplest solution is to just cleanup as part of the job.

HOWEVER! We do not want to cleanup at the end, because a failed execution of one of the notebooks would exit the whole script so any *.nbconvert.ipynb notebooks created from successful execution of previous notebooks would remain. Instead, it would be better to cleanup from the previous execution. As a bonus, keeping those notebooks allows us to look at the output cells.

Event Timeline

mpopov triaged this task as Medium priority.
mpopov moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Change 741688 had a related patch set uploaded (by Bearloga; author: Bearloga):

[analytics/wmf-product/jobs@master] movement_metrics: Cleanup notebooks dir

https://gerrit.wikimedia.org/r/741688

Change 741688 merged by Bearloga:

[analytics/wmf-product/jobs@master] movement_metrics: Cleanup notebooks dir

https://gerrit.wikimedia.org/r/741688