Checks of infrastructure related to prior WMDE analytics infrastructure. Note that for these checks I'm looking at versions that were edited and the originals from Gerrit. Generally what the following entails is codebase searches for the given topics.
- [ ] Check [[ https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics | WD Analytics code ]] for relations to other data services and a username
- [[ https://gerrit.wikimedia.org/r/admin/repos/q/filter:analytics%252Fwmde%252FWD | Gerrit code ]]
- [x] Graphite
- None (`libgraphite2` - for fonts)
- [x] Cloud VPS (references and two main endpoint URLs)
- References in HTML `<a>` tags of `ui.R` and `app_ui.R` files
- R and Docker based deployment files
- URL targets to https://wikidata-analytics.wmcloud.org
- [x] Published data export path
- R files for reading and writing data and HTML `<a>` tags
- [x] Cronjobs and commands
- R cron jobs, data lake ETL Python jobs, moving data to HDFS, cleaning up HDFS, settings configurations, copying update strings to published data folder, kerberos credential initialization, copying files to published data folder, logging runs, Spark 2 commands, Docker commands, dependency installation
- [x] PHP codes
- No references to `analytics-wmde-scripts`
- No references to `statistics::wmde` (puppet)
- References to `php` in code are for the Wikidata action API, Commons API and Reference Hunt PHP stats report
- [x] User name
- IRC and Wikimedia nic references
- Code comments
- Personal DB used for some data sources (`wdcm_clients_wb_entity_usage`)
- Paths to config files and other setup files
- Paths for saved data
- [ ] Archive Gerrit repo
- [ ] Archive/delete GitHub repo (check if not automatic)
- [ ] Check [[ https://github.com/wikimedia/analytics-wmde-WDCM | WDCM code ]] for relations to other data services and a username (deprecated on GitHub because of deprecation on Gerrit)
- [[ https://gerrit.wikimedia.org/r/admin/repos/q/filter:analytics%252Fwmde%252FWDCM | Gerrit code ]]
- [x] Graphite
- None (`libgraphite2` - for fonts)
- [x] Cloud VPS (references and two main endpoint URLs)
- References in HTML `<a>` tags
- URL targets to http://wmdeanalytics.wmflabs.org/
- [x] Published data export path
- R files for reading and writing data
- [x] Cronjobs and commands
- R cron jobs, data lake ETL Python jobs, moving data to HDFS, copying update strings to published data folder
- [x] PHP codes
- No references to `analytics-wmde-scripts`
- No references to `statistics::wmde` (puppet)
- References to `php` in code are for the Wikidata action API and Reference Hunt PHP stats report
- [x] User name
- IRC and Wikimedia nic references
- Code comments
- Personal DB used for some data sources (`wdcm_clients_wb_entity_usage`)
- Paths to config files and other setup files
- Paths for saved data
- [x] Archive Gerrit repo
- [x] Archive/delete GitHub repo (check if not automatic)
- [ ] Check [[ https://github.com/wikimedia/analytics-wmde-WiktionaryCognateDashboard | Wiktionary Cognates code ]] for relations to other data services and a username
- [[ https://gerrit.wikimedia.org/r/admin/repos/q/filter:analytics%252Fwmde%252FWiktionary | Gerrit Code ]]
- [x] Graphite
- None (`libgraphite2` - for fonts)
- [x] Cloud VPS (references and two main endpoint URLs)
- None
- [x] Published data export path
- R files for reading and writing data
- [x] Cronjobs and commands
- Installing dependencies
- [x] PHP codes
- No references to `analytics-wmde-scripts`
- No references to `statistics::wmde` (puppet)
- [x] User name
- IRC nic references
- Original .yml file references to local directories
- `.Rproj.user` paths to local directories where Wiktionary Cognate dockerfiles, engines and dev start files are `_Img/Wiktionary-Cognate`
- [ ] Archive Gerrit repo
- [ ] Archive/delete GitHub repo (check if not automatic)
- [ ] Check [[ https://gerrit.wikimedia.org/r/admin/repos/q/filter:analytics%252Fwmde%252FNewEditors | NewEditors ]] Gerrit codes
- As there were cron jobs that were still running for this
- It looks like there's only one file in here that's a config file for Gerrit?
- [ ] Archive Gerrit repo?
- [x] Check of [[ https://github.com/wikimedia/analytics-wmde-scripts | PHP codes ]] that could be related and are still in use
- Assumption from time of writing is no connections
- [x] Check both Cloud VPS instances
- No references to `wmdeanalytics`
- No references to `wikidata-analytics`
- [x] Check published data export path
- No references to `wmde-analytics-engineering`
- [x] Check references to projects
- No reference to `wdcm`, `cognate`, `curator`, or `wd_analytics`
- [x] Checking GitHub for routes to [[ https://wikidata-analytics.wmcloud.org | dashboard endpoint ]] or the [[ http://wmdeanalytics.wmflabs.org/ | other dashboard endpoint ]]
- Endpoint: [[ https://github.com/wikimedia/wikidata-query-builder/blob/f4ee611f52bc80525f4be7ff34938f5888ee27d7/src/components/Footer.vue#L53 | Wikidata Query Builder footer code ]], [[ https://github.com/wmde/wikidata-mismatch-finder/blob/7ccb74874cdbb9160c1177a765dc53ec83a44f1a/resources/js/Pages/Layout.vue#L74 | Mismatch Finder Layout code ]], Wikidata contributor documentation issue, URLS in code for projects themselves, comments in code for the projects themselves, similarly named Python scripts from WMF employees, related business forks of the projects
- Other endpoint: WMF repos for the projects themselves, WMDE employee fork, related business fork of the projects,
- [x] Checking GitHub for routes to [[ https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/ | published data export paths ]]
- WMF repos for the projects themselves, [[ https://github.com/search?q=repo%3Awmde%2Fgrafana-dashboards+wmde-analytics-engineering&type=code | deprecated Grafana dashboard codebase ]] and forks of it
- [ ] Checking [[ https://graphite.wikimedia.org/ | Graphite ]] processes for references to WD, WDCM and Wiktionary Cognates processes
- [x] Checking [[ https://wikitech.wikimedia.org/wiki/Prometheus | Prometheus ]]
- Given timing for when Prometheus was adopted (2021), there is a level of certainty that no jobs are related
- [ ] Cronjobs
- [[ https://docs.google.com/spreadsheets/d/1w2f_ndQa6Lo2BBfPJ88sJLSg2RJeTQKFNOPd0zjiB4I/edit#gid=1625715087 | Sheet with PHP jobs overview ]]
- [[ https://docs.google.com/spreadsheets/d/1w2f_ndQa6Lo2BBfPJ88sJLSg2RJeTQKFNOPd0zjiB4I/edit?usp=sharing | Sheet with R jobs overview ]]
- [x] Misc cron jobs
- Via WMF: there's only one job still running on the given user account - `2021_WMDEMitmachenBereichCampaign_PRODUCTION.R`
- Box is checked once this job has been migrated from current account or stopped
- [ ] Cloud VPS general check
- [ ] Are wikidata-analytics.wmcloud.org and wmdeanalytics.wmflabs.org the only ones we need to consider?
- [ ] HDFS
- Do we need to investigate this? There are references to cleaning a directory there.
See [Legacy Infrastructure Investigation](https://docs.google.com/document/d/1r0dLNPO_-JbXwstT9afxCvgVVmEjRR6B_WA-RmcLjDM/edit)