The wmf_content.mediawiki_content_history_v1 Hive table is updated daily and might replace some critical monthly/weekly dependencies.
Docs at https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Content/Mediawiki_content_history_v1.
Tasks
- quickly explore the dataset
- check how many dependencies it could replace, most notably:
- wmf.mediawiki_wikitext_current
- wmf.wikidata_item_page_link
- wmf.wikidata_entity
- structured_data.commons_entity
- estimate work to let data pipelines consume the dataset