There is not currently an easy way to share SWAP (nee PAWS internal) notebooks or view other's notebooks (as exists in PAWS through static notebook viewing in a browser). It's possible to download a notebook through the web interface and then share it via email, Gist (as long as there's no private data shared), etc but not to share or view notebooks in real-time. This functionality would be useful to collaborate on code and share results with others.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Duplicate | None | T188275 Jupyter Notebooks TLC 2018-2019 | |||
Resolved | Ottomata | T224658 Newpyter - SWAP Juypter Rewrite | |||
Open | None | T156934 Functionality to share & view notebooks | |||
Declined | None | T290693 Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members |
Event Timeline
@yuvipanda pointed out that it is already possible to copy over notebooks and auxiliary files directly on the server using cp on the command line (use 'New' -> 'Terminal' while in the root notebook folder). Although this option does not replace a full sharing/viewing functionality, it should already be useful for some needs.
My current workaround for this is to track my projects in Git (which is a good practice generally) and then push them to GitHub, which has pretty good support for viewing Jupyter notebooks. It works relatively well for me, and you can look at my sampling work for the annual editor survey for a good example.
However, there are some downsides to this approach:
- There's no way to share notebooks privately unless we were to start paying for enterprise GitHub accounts (not just for the data analysts who create notebooks, but anybody who needs to view them as well). I can deal with this since I mainly work with public edit data, but it can be a big problem if you work with reader or EventLogging data.
- You need to do a lot of manual command line for every single change you want to push. This is my usual workflow:
- ssh notebook1001.eqiad.wmnet
- cd proj/my-project
- git status
- git add file-1
- git add file-2
- git commit -m "Did a bunch of crazy stuff"
- git push
- Type in my GitHub username
- Find my password in my password manager (because if I put an SSH private key on the server, every user would have access to it)
- Paste password in
- Git diffs of Jupyter notebooks are very unhelpful, because the vast majority of the changes shown are to notebook metadata rather than to your code. See if you can figure out what exactly I updated in this commit :)
- To fork a notebook, you have to go to the command line and clone the repo from GitHub. It's definitely possible, but still the kind of thing that would deter a product manager who wants to tweak the where clause in an SQL query and update the resulting graph.
I figured how to speed up these steps considerably, thanks to this answer on SuperUser. Add the following to ~/.gitconfig:
[url "https://YOURUSERNAME@github.com"] insteadOf = https://github.com [credential] helper = cache --timeout=28800
This will automatically apply your GitHub user name to any HTTPS access, and then cache the password you enter for 8 hours (28 800 seconds).
I think this ticket is tracking functionality for our internal Jupyter-based system (formerly called Jupyter-SWAP) rather than the external Jupyter-based PAWS. We keep them separate since the internal one has access to sensitive information.
Details at: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Jupyter