Page MenuHomePhabricator

[Epic] Scraping data dumps and scraper enhancements
Open, Needs TriagePublic

Description

Scraping data dumps

The scraper helps us gain insights into how references/citations are used in wikis. We're also building some of our success metrics based on the aggregated data the scraper creates from the enterprise data dumps (which are not publicly available anymore since March 25). The data is now only available via the Enterprise API .

As a first step, we want to enhance the scraper to collect sub-ref specific data. We will then run it in a regular cadence that allows us to monitor and learn fast. Since the enterprise dumbs are available on the 1st and 20th of each month, this schedule is currently sufficient. The enterprise team has offered to discuss an alternative cadence, should we need another.

Links & resources

Steps

  1. Get access to data: T396720: Scraper: Use Enterprise API to retrieve dumps
  2. Enhance the scraper: T396729: Scraper: Add new metrics for sub-ref data
  3. Run the scraper: T396730: Scraper: Run the scraper with the new features regularly
  4. Analyze subref contents T397124: Try to categorize different subref usage types
  5. Build dashboards: T396731: Scraper: Build a new dashboard for the updated scraper data

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedSpikeNone
Resolvedawight
OpenSpikeNone
OpenNone
ResolvedNone
OpenNone
OpenNone
ResolvedNone
OpenNone
OpenNone
Resolvedawight
Resolvedawight
ResolvedGehel
Resolvedawight
InvalidNone
ResolvedTobi_WMDE_SW
OpenNone
Resolvedawight
DeclinedNone
ResolvedNone
Invalidawight
OpenNone
OpenNone
DuplicateWMDE-Fisch
Resolvedawight
ResolvedNone
DeclinedNone
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
ResolvedNone

Event Timeline

Lina_Farid_WMDE renamed this task from Story for scraper to Scraping data dumps and scraper enhancements.Jul 2 2025, 12:15 PM
WMDE-Fisch renamed this task from Scraping data dumps and scraper enhancements to [Epic] Scraping data dumps and scraper enhancements.Nov 26 2025, 7:10 AM
WMDE-Fisch edited projects, added Epic; removed Story.