nobody knows what aggregate-datasets, limn-public-data, etc. are for
Requirements
- Dashiki needs structured folder for structured metrics *something*/<<metric-name>>/<<submetric-name>>/<<wiki>>.tsv
- Dashiki needs unstructured folders to graph random files (hopefully this doesn't get too crazy, maybe all should go in a base directory that's specifically for unstructured metrics
- Researchers on stat1003 output public datasets
- Researchers on stat1002 output public datasets
Current State
- stat1003 rsyncs to limn-public-data
- stat1002 rsyncs to aggregate-datasets
- stat1002 *now* rsyncs to limn-public-data
- ?? public-datasets (looks like ad-hoc work)
Ideal Solution
stat1001: https://datasets.wikimedia.org
README.md /common README.md: this is rsynced from stat1002 and 1003 and wherever with no --delete /reports README.md /per-wiki /sessions /visualeditor /enwiki.tsv /all.tsv /wikitext /enwiki.tsv /all.tsv /cross-wiki /request-breakdowns (now browser, we should rename) /by-os-or-browser.tsv /by-os.tsv
stat1003:/srv/reportupdater/output/... -> stat1001:.../reports/
stat1002:/a/reportupdater/output/... -> stat1001:.../reports/
Steps
- move unstructured stuff from limn-public-data/* to common/legacy/limn-public-data/*
- symlink limn-public-data to common/legacy/limn-public-data
- move structured stuff from limn-public-data to reports
- announce the plan to do the same thing for aggregate-datasets and public-data
- in the distant future delete the symlinks
- Make sure intentions for directories are documented in README
- send an email to list
- wikitech documentation?
- update dashiki config & code for datasets api root (remove /metrics)
- update the output paths of reportupdater jobs