Page MenuHomePhabricator

[Session] Commons Impact Metrics BETA Data Dumps available today!
Closed, ResolvedPublic

Description

  • Title of session (the more detailed, the better!): Commons Impact Metrics BETA Data Dumps available today!
  • Session description: We will walk through the Commons Impact Metrics project goals, learnings and progress. We will share why we are creating this data product for the GLAM community, how we modeled the data, our dumps and API Design, and where we are on our roadmap. There will also be an invitation to collaborate on our hackathon project: "Establish a process for setting up local environments to explore the Commons Impact Metrics data dumps".
  • Username for contact: @mforns
  • Session duration (25 or 50 min): 25
  • Session type (presentation, workshop, discussion, etc.): presentation
  • Language of session (English, Arabic, etc.): English
  • Prerequisites (some Python, etc.): Familiarity with commons data in general would be helpful, but not required
  • Any other details to share?: Commons Impact Metrics project
  • Interested?

Add your username below:

Notes from the session

Commons Impact Metrics BETA Data Dumps available today

Date and time:

Relevant links

Presenter

@mforns: Data Products team at the Wikimedia Foundation

Participants

AnotherJensen
SocialKnowledge

Notes

Long term request from the Commons community, especially GLAM
Way to measure the impact of their contributions
Evaluate what the ROI is, especially for institutions
Community built tools like BaGLAMa, GLAMorous
These tools started to fail
Value proposition: Central precomputed data for the impact metrics for media contributed to Wikimedia Commons
Goals: standard, easy to queyr, robust
Started with GLAMWiki dashboard, BaGLAMa2, GLAMorgan
https://dumps.wikimedia.org/other/commons_impact_metrics/readme.html
Category metrics
Snapshot based
Based on category
Media file metrics
Get the titles of all used images from a given category tree
Pageviews by category
Pageviews by mediafile
Edits
End goal is to have REST API
14 endpoints
Media requests vs. pageviews
considering pageviews
counts per wiki and page
filters bots and automated traffic
scroll not counted
monthly scope
Learnings
Category
started with primary categories such as musuems
parent and children, continue until the last point
Reduce the scope
category allow list
max tree depth (7)
monthly aggregation
Roadmap

Questions

Search has done something like to navigate search through categories
Question: choice of using pageviews vs. webrequests; pageviews are inflated
counts the pageviews for the whole month even if the image is added at the end of the month

Event Timeline

Hello! 👋 The 2024 Hackathon Program is open for scheduling! If you are still interested in organizing a session, you can claim a slot on a first-come, first-serve basis by adding your session to the daily program, following these instructions. We look forward to hearing your presentation!

VirginiaPoundstone renamed this task from [Session] Commons Impact Metrics Data Dumps available today! to [Session] Commons Impact Metrics BETA Data Dumps available today!.Apr 29 2024, 5:04 PM
VirginiaPoundstone updated the task description. (Show Details)
debt triaged this task as Medium priority.Apr 29 2024, 5:45 PM