Page MenuHomePhabricator

Functionality to share & view notebooks
Open, LowPublic

Description

There is not currently an easy way to share SWAP (nee PAWS internal) notebooks or view other's notebooks (as exists in PAWS through static notebook viewing in a browser). It's possible to download a notebook through the web interface and then share it via email, Gist (as long as there's no private data shared), etc but not to share or view notebooks in real-time. This functionality would be useful to collaborate on code and share results with others.

Event Timeline

@yuvipanda pointed out that it is already possible to copy over notebooks and auxiliary files directly on the server using cp on the command line (use 'New' -> 'Terminal' while in the root notebook folder). Although this option does not replace a full sharing/viewing functionality, it should already be useful for some needs.

Tbayer renamed this task from Functionality to share & view PAWS internal notebooks to Functionality to share & view SWAP notebooks.Jun 16 2017, 10:36 PM
Tbayer updated the task description. (Show Details)
Chicocvenancio subscribed.

My current workaround for this is to track my projects in Git (which is a good practice generally) and then push them to GitHub, which has pretty good support for viewing Jupyter notebooks. It works relatively well for me, and you can look at my sampling work for the annual editor survey for a good example.

However, there are some downsides to this approach:

  • There's no way to share notebooks privately unless we were to start paying for enterprise GitHub accounts (not just for the data analysts who create notebooks, but anybody who needs to view them as well). I can deal with this since I mainly work with public edit data, but it can be a big problem if you work with reader or EventLogging data.
  • You need to do a lot of manual command line for every single change you want to push. This is my usual workflow:
    • ssh notebook1001.eqiad.wmnet
    • cd proj/my-project
    • git status
    • git add file-1
    • git add file-2
    • git commit -m "Did a bunch of crazy stuff"
    • git push
    • Type in my GitHub username
    • Find my password in my password manager (because if I put an SSH private key on the server, every user would have access to it)
    • Paste password in
  • Git diffs of Jupyter notebooks are very unhelpful, because the vast majority of the changes shown are to notebook metadata rather than to your code. See if you can figure out what exactly I updated in this commit :)
  • To fork a notebook, you have to go to the command line and clone the repo from GitHub. It's definitely possible, but still the kind of thing that would deter a product manager who wants to tweak the where clause in an SQL query and update the resulting graph.
In T156934#4049358, @Neil_P._Quinn_WMF wrote:

My current workaround for this is to track my projects in Git (which is a good practice generally) and then push them to GitHub, which has pretty good support for viewing Jupyter notebooks.
[...]

  • You need to do a lot of manual command line for every single change you want to push. This is my usual workflow:

[...]

  • git push
  • Type in my GitHub username
  • Find my password in my password manager (because if I put an SSH private key on the server, every user would have access to it)
  • Paste password in

I figured how to speed up these steps considerably, thanks to this answer on SuperUser. Add the following to ~/.gitconfig:

[url "https://YOURUSERNAME@github.com"]
    insteadOf = https://github.com

[credential]
    helper = cache --timeout=28800

This will automatically apply your GitHub user name to any HTTPS access, and then cache the password you enter for 8 hours (28 800 seconds).

mpopov renamed this task from Functionality to share & view SWAP notebooks to Functionality to share & view notebooks.Sep 9 2021, 8:47 PM
mpopov lowered the priority of this task from Medium to Low.
mpopov added a project: Product-Analytics.
mpopov moved this task from Triage to Tracking on the Product-Analytics board.
mpopov removed subscribers: madhuvishy, Tbayer.

I believe this has been resolved with https://github.com/toolforge/paws/pull/119

I think this ticket is tracking functionality for our internal Jupyter-based system (formerly called Jupyter-SWAP) rather than the external Jupyter-based PAWS. We keep them separate since the internal one has access to sensitive information.

Details at: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Jupyter

Gehel subscribed.

Removing DPE SRE until this is picked up by Data-Engineering