Hive code to count global unique devices per top domain (like *.wikipedia.org). Initial work will be just quality checking to make sure global counts and per-site counts are in agreement, once that vetting is done we would need to calculate global counts per domain and study what percentage offset/ estimate represent of the total number.
Things to do for deploy:
- Stop currently running oozie `unique_devices_project_wide` jobs (daily, monthly and daily-druid)
- Create `unique_devices_project_wide` hive tables (daily and monthly)
- Move already computed time-partitioned folder structure from `/user/joal/wmf/data/wmf/unique_devices/project_wide/` to `/wmf/data/wmf/unique_devices/project_wide/`
- Run MSCK repair on both daily and monthly prod tables.
- drop tables in joal database
- Archives: Move exisitng project-wide archives to `/user/joal`, not yet ready for external visibility
- Restart Oozie jobs with production settings and last-run dates, except for druid-daily that needs to be fully re-run (bug in previous run)
- Don't forget to setup archive folder to `/user/joal`