Resources:
Acceptance criteria:
- Investigate where to get and how to manipulate the data.
- Explore different tools to display this data in a single location.
- T336282 investigated Git access to notebooks via an extension
- Instructions on setting up internal JupyterHub and PAWS for HTTPS Git: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Jupyter/Tips
- Internal JupyterHub
- Data access: private data via internal environment
- Git version control: via HTTPS works for public and private repos
- PAWS
- Data access: public data via PAWS environment, but no private data
- Git version control: via HTTPS works for public and private repos
- Notebooks on GitHub/GitLab
- Data access: more cumbersome as it's not within a WMF environment
- Git version control: SSH possible
- Create a proof of concept (initial test report using any of the data from milestone 2).
- Comment/iterate on the selection criteria and possible solutions for notebooks.
- Note that for a cron job this would need to either:
- Recreate the env within GitHub Actions (hard with a potential of a breach)
- We'd need to do a cron run within the internal JupyterHub (no domain knowledge in team, but WMF does it)
- The above might point to setting up a dashboard with these values being the best, but then we wouldn't have prior reports saved as records
- Note that for a cron job this would need to either: