This task is to document the current infrastructure and assets of WMDE Analytics. The task text will be updated as the discussion below progresses.
## Tables
- presto_analytics_hive/goransm/wdcm_clients_wb_entity_usage
- This table cannot be queried as it's in his private repo
- General schema is:
- eu_row_id: BIGINT
- eu_entity_id: VARCHAR
- eu_aspect: VARCHAR
- eu_page_id: BIGINT
- wiki_db: VARCHAR
## Data dumps
- https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/
- https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wiktionary/
- https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/qurator/
- https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/
## Servers
The following are found on Cloud VPS:
- wikidata-analytics-1
- wiktionary-cognate-1
## Code repos
- WikidataAnalytics (includes Wikidata Concepts Monitor)
- [[ https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics | GitHub ]]
- WiktionaryCognateDashboard
- [[ https://github.com/wikimedia/analytics-wmde-WiktionaryCognateDashboard | GitHub ]]
## CRON jobs
The following is via https://phabricator.wikimedia.org/T334951#8980911:
- `WDCM_Sqoop_Clients runs` on `stat1004` weekly - It doesn't run spark (but Sqoop)
- `2021_WMDE_Mitmachen_Bereich_2021_Campaign` runs on `stat1007` daily - It doesn't run spark (but Hive)
- `WD_PageviewsPerType` runs on `stat1007` daily but has been failing since February 17th - It runs a spark job
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/WD_PageviewsPerType/ | Link to data ]]
- `WD_UsageCoverage` runs on `stat1008` daily
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/WD_percentUsage/ | Link to data ]]
- `WD_languagesLandscape` runs on `stat1008` monthly (30th of the month)
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/WD_Languages_Landscape/ | Link to data ]]
- `Wiktionary_CognateDashboard` runs on `stat1008` daily
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wiktionary/ | Link to data ]]
- `WDCM_EngineBiases` runs on `stat1008` weekly
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/biases/ | Link to data ]]
- `Qurator_CuriousFacts` runs on `stat1008` monthly (10th of the month)
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/qurator/curious_facts/ | Link to data ]]
- `WMDE_BannerImpressions` runs on `stat1008` hourly
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/WMDE_Banners/ | Link to data ]]
- `NewEditors_comprehensive_report` runs on `stat1008` daily
- [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/WMDE_NewEds_Comprehensive/ | Link to data ]]