Page MenuHomePhabricator

Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members
Open, LowPublic

Description

Motivation

As a data scientist I would like to have a way of sharing Jupyter notebooks containing confidential data with members of my team and others in the Foundation.

There is currently no way to do this.

  • Google Colab isn't part of Google Workspace (formerly G Suite)
  • Google Drive doesn't support/render ipynb files
  • Exporting notebook to PDF via JupyterLab's interface produces horrible output
  • Exporting notebook to HTML doesn't work since GDrive doesn't support HTML files
  • Exporting notebook to HTML and converting to PDF via web browser produces bad output (although not quite as bad as direct to PDF via Jupyter)
  • Uploading notebook to GitHub as a private Gist technically works but it's unknown how private those private gists truly are (GH's Terms of Service & Privacy Policy would need to be reviewed by Security & Legal teams)
  • Uploading notebook to a private GitHub repo involves cumbersome permission management and adding users as collaborators

Proposal

Since we already have a system in place for gated access (superset.wikimedia.org requires logging in through Wikimedia Developer SSO and membership in 'wmf' or 'nda' groups), we could have a gated and internally-hosted nbviewer service.

Perhaps the stat100X hosts can have a directory similar to /srv/published where these notebooks can be manually copied/rsynced to, only instead of becoming publicly accessible via analytics.wikimedia.org/published/, they would be accessible via a (gated) nbviewer-internal.wikimedia.org (for example).

Alternative, interim solutions

SolutionProsCons
people.wikimedia.org + .htaccessEasy to set upRequires exporting to HTML and uploading, stakeholders not in 'wmf'/'nda' groups can't access
Google SitesCan share with stakeholders without LDAPRequires copying HTML code in an embed component and manually resizing that component by dragging

And as Martin wrote in T290693#7344878:

The Google Site works as well, but it can only be used within the Foundation (for instance, you wouldn't be able to share information with NDA'ed volunteers or WMDE staff).


This request is separate from the previously requested T156980, and may be a part of the solution to T156934.

Event Timeline

mpopov triaged this task as Low priority.Sep 9 2021, 8:45 PM
mpopov created this task.

When I need this sort of safe storage, I make use of https://people.wikimedia.org/~urbanecm/nda. people.wikimedia.org allows anyone with production shell account to host arbitrary files (people.eqiad.wmnet is the host you need; anything you put into public_html will show up at people.wikimedia.org/~yourusername), and you can restrict access to certain subfolders via a htaccess file, to ensure only users who are a member of a particular LDAP group can access it. This functionality is documented at https://wikitech.wikimedia.org/wiki/People.wikimedia.org.

This is my .htaccess file:

[urbanecm@people1003 ~/public_html/nda]$ cat .htaccess
AuthType CAS
Require cas-attribute memberOf:cn=nda,ou=groups,dc=wikimedia,dc=org
Require cas-attribute memberOf:cn=wmf,ou=groups,dc=wikimedia,dc=org
Require user urbanecmtest
[urbanecm@people1003 ~/public_html/nda]$

I recognize this isn't an user-friendly solution, and it isn't usable for everyone (only users with access to stat boxes, or other kind of production access, can make use of it), but it's the best possibility that's available as-of now.

That being said, I fully support implementing a private viewer (+storage) for this purpose, as that'd be the most convenient way to share notebooks privately.

@Urbanecm Thank you!!! That works really well as an interim solution and I was able to upload & restrict https://people.wikimedia.org/~bearloga/nda/druid-pageviews.html

Alternatively, I followed the instructions in Publishing Rmarkdown to Google Sites and created a site & page that is only accessible to Wikimedia staff (since it's part of Google Workspace): https://sites.google.com/wikimedia.org/product-analytics/querying-druid

SolutionProsCons
people.wikimedia.org + .htaccessEasy to set upRequires exporting to HTML and uploading, stakeholders not in 'wmf'/'nda' groups can't access
Google SitesCan share with stakeholders without LDAPRequires copying HTML code in an embed component and manually resizing that component by dragging

@mpopov Confirmed, I'm able to access the link through my NDA account, but not via my test account :-). I'm glad it works for you.

The Google Site works as well, but it can only be used within the Foundation (for instance, you wouldn't be able to share information with NDA'ed volunteers or WMDE staff).