Page MenuHomePhabricator

Re-evaluate plan for reference usage baseline statistics
Open, Needs TriagePublic

Description

Currently our main approach to measure the success for the WMDE-References-FocusArea is mostly quantitative by scraping the rendered HTML and see how references are used or re-used. See T332032: Create baseline statistics for reference usage (2023)

Due to the instability of the underlying Wikimedia Enterprise HTML dumps we need to re-evaluate that approach and might want to look into alternatives or live with the uncertainties of the scraper results. We'll do another small run specifically to validate the upstream data.

Our scraper produces the following metrics: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/blob/main/metrics.md
These should be reviewed for applicability—we found that some of the metrics will be difficult to use directly, they may have to be changed or aggregated differently, and we may want another set of metrics to measure additional characteristics about pages. Also consider how we will analyze changes over time.

Meeting notes related to this ticket: https://docs.google.com/document/d/1oFiDqWfdI4TO69TSv9jFgeFCXu6ZV-t377PRx3Y3UCo
List of current assumptions about user behavior and success metrics that we potentially want to measure with statistics: https://docs.google.com/document/d/1isEazL9rgR7k0rUELRCyY6pJgTflX4oDoj9DxQdp_G4
This list is not final and not prioritized yet.

  • Prioritize and complete list of assumptions about Reference Reuse (and success metrics for its improvement) -> UX-led
  • Research and decide on statistics to measure the prioritized success metrics, probably on a case-by-case basis. For the implementation of these we will create individual tickets.
  • T357611: Run the scraper on a few wikis and verify some of the information (at least the total article count) using other data sources.

Implementation tickets
T357613: Measure the reference use and re-use in VE

Event Timeline

thiemowmde renamed this task from Re-evaluate basline statistics for reference usage to Re-evaluate baseline statistics for reference usage.Feb 15 2024, 10:31 AM
awight renamed this task from Re-evaluate baseline statistics for reference usage to Re-evaluate plan for reference usage baseline statistics.Feb 23 2024, 8:10 AM
awight updated the task description. (Show Details)
awight updated the task description. (Show Details)