## User Story
As a data analyst working on content and contributor metrics using MediaWiki History, I need the most recently generated snapshot to have all the production wikis that were open/active at the time of the snapshot so that I'm working with data from all wikis – including languages/projects recently graduated from Incubator – allowing me to paint a more complete picture of growth and productivity in the Wikimedia movement.
Context:
- {T349743}
- {T329119}
- {T299548}
- {T220456}
## Notes
- `canonical_data.wikis` (sourced from https://github.com/wikimedia-research/canonical-data/blob/master/wiki/wikis.tsv) is updated pretty frequently
- In addition to that:
- There's @KCVelaga's [[ https://github.com/wikimedia-research/wikimedia_project_creation_closure_dts | structured list of when each Wikimedia project was created, and if applicable, the closure date as well ]]
- And @Hghani's **[[ https://gitlab.wikimedia.org/repos/research/incubator-data-exploration/-/blob/main/02_wrangling_scripts/site_creation_scraper.ipynb | site creation scraper ]]** which scrapes the newest wikis from the [[ https://incubator.wikimedia.org/wiki/Incubator:Site_creation_log | Site Creation Log ]]
- Some checks already implemented {T354692}