On stat1007:
```
$ ls -l /srv/published/datasets/discovery/metrics/external_traffic/
total 5548
-rw-rw-r-- 1 analytics-search analytics-search-users 3110291 Feb 9 05:02 referer_data.tsv
-rw-rw-r-- 1 analytics-search analytics-search-users 2565465 Feb 9 05:04 referer_nonbot_data.tsv
$ tail -n 1 /srv/published/datasets/discovery/metrics/external_traffic/referer_data.tsv
2021-02-08 TRUE external (search engine) Daum mobile web 141338
$ ls -l /srv/published/datasets/discovery/metrics/wdqs/
total 848
-rw-rw-r-- 1 analytics-search analytics-search-users 864510 Feb 9 05:07 basic_usage.tsv
$ tail -n 1 /srv/published/datasets/discovery/metrics/wdqs/basic_usage.tsv
2021-02-08 /bigdata/ldf TRUE FALSE 14
```
These files are generated with the help of [[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater | Reportupdater ]], run by [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/main.sh | main.sh ]] that is scheduled through `kerberos::systemd_timer` in this manifest: [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/statistics/manifests/discovery.pp | statistics/discovery.pp ]]. That belongs to the miscellaneous jobs manifest ([[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/statistics/explorer/misc_jobs.pp | profile::statistics::explorer::misc_jobs ]]).
**Notes**: Both reports are non-R scripts that run `hive`:
- [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/external_traffic/referer_data | modules/metrics/external_traffic/referer_data ]] (and [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/external_traffic/referer_nonbot_data | referer_nonbot_data ]])
- [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/wdqs/basic_usage | modules/metrics/wdqs/basic_usage ]]
These are necessary for the still-used External Traffic (https://discovery.wmflabs.org/external/) and Wikidata Query Service (https://discovery.wmflabs.org/wdqs/) dashboards.
-----
Log of steps taken:
1. updated the reportupdater submodule to the latest commit (https://gerrit.wikimedia.org/r/c/wikimedia/discovery/golden/+/677244)
2. `sudo -u analytics-search git submodule update` in stat1007:/srv/discovery/golden
3. reset my venv on stat1007 via https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter-SWAP#Resetting_user_virtualenvs
4. ran `pip install -U pid python-dateutil pymysql PyYAML Jinja2 dnspython` (pip install -r reportupdater/requirements.txt` failed)
5. Created /srv/discovery/venv and installed reportupdater's dependencies there as analytics-search (see T279443#6994725), so that this process is no longer dependent on me and my venv
6. Uploaded & merged patch that sets PYTHONPATH in discovery.pp (see [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/678864 | 678864 ]], fixed in [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/678891 | 678891 ]])
7. Uploaded & merged patch to fix queries for current version of Hive on analytics clients ( [[ https://gerrit.wikimedia.org/r/c/wikimedia/discovery/golden/+/679490 | 679490 ]]), expecting reportupdater to backfill the reports from Feb 8 onward since we're still within the 90 day retention window