## Scraping data dumps
The scraper helps us gain insights into how references/citations are used in wikis. We're also building some of our success metrics based on the aggregated data the scraper creates from the enterprise data dumps (which are not publicly available anymore since March 25). The data is now only available via the [[ https://enterprise.wikimedia.com/api/ | Enterprise API ]] .
As a first step, we want to enhance the scraper to collect sub-ref specific data. We will then run it in a regular cadence that allows us to monitor and learn fast. Since the enterprise dumbs are available on the 1st and 20th of each month, this schedule is currently sufficient. The enterprise team has offered to discuss an alternative cadence, should we need another.
**Links & resources**
* [[ https://docs.google.com/spreadsheets/d/1w1WE8sGfZfIt6gJEY_9wAoxJoYl7-NnCWrSion_CMSs/edit?gid=137825452#gid=137825452 | old scraper aggregated data in spreadsheet ]]
* https://enterprise.wikimedia.com/api/
**Steps**
# {icon arrow-down} Get access to data: {T396720}
# {icon wrench} Enhance the scraper: {T396729}
# {icon running} Run the scraper: {T396730}
# {icon eye} Analyze subref contents {T397124}
# {icon eye} Build dashboards: {T396731}