On stat1007:
$ ls -l /srv/published/datasets/discovery/metrics/external_traffic/ total 5548 -rw-rw-r-- 1 analytics-search analytics-search-users 3110291 Feb 9 05:02 referer_data.tsv -rw-rw-r-- 1 analytics-search analytics-search-users 2565465 Feb 9 05:04 referer_nonbot_data.tsv $ tail -n 1 /srv/published/datasets/discovery/metrics/external_traffic/referer_data.tsv 2021-02-08 TRUE external (search engine) Daum mobile web 141338 $ ls -l /srv/published/datasets/discovery/metrics/wdqs/ total 848 -rw-rw-r-- 1 analytics-search analytics-search-users 864510 Feb 9 05:07 basic_usage.tsv $ tail -n 1 /srv/published/datasets/discovery/metrics/wdqs/basic_usage.tsv 2021-02-08 /bigdata/ldf TRUE FALSE 14
These files are generated with the help of Reportupdater, run by main.sh that is scheduled through kerberos::systemd_timer in this manifest: statistics/discovery.pp. That belongs to the miscellaneous jobs manifest (profile::statistics::explorer::misc_jobs).
Notes: Both reports are non-R scripts that run hive:
- modules/metrics/external_traffic/referer_data (and referer_nonbot_data)
- modules/metrics/wdqs/basic_usage
These are necessary for the still-used External Traffic (https://discovery.wmflabs.org/external/) and Wikidata Query Service (https://discovery.wmflabs.org/wdqs/) dashboards.
Log of steps taken:
- updated the reportupdater submodule to the latest commit (https://gerrit.wikimedia.org/r/c/wikimedia/discovery/golden/+/677244)
- sudo -u analytics-search git submodule update in stat1007:/srv/discovery/golden
- reset my venv on stat1007 via https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter-SWAP#Resetting_user_virtualenvs
- ran pip install -U pid python-dateutil pymysql PyYAML Jinja2 dnspython (pip install -r reportupdater/requirements.txt` failed)
- Created /srv/discovery/venv and installed reportupdater's dependencies there as analytics-search (see T279443#6994725), so that this process is no longer dependent on me and my venv
- Uploaded & merged patch that sets PYTHONPATH in discovery.pp (see 678864, fixed in 678891)
- Uploaded & merged patch to fix queries for current version of Hive on analytics clients (679490), expecting reportupdater to backfill the reports from Feb 8 onward since we're still within the 90 day retention window