- Title of session (the more detailed, the better!): Commons Impact Metrics BETA Data Dumps available today!
- Session description: We will walk through the Commons Impact Metrics project goals, learnings and progress. We will share why we are creating this data product for the GLAM community, how we modeled the data, our dumps and API Design, and where we are on our roadmap. There will also be an invitation to collaborate on our hackathon project: "Establish a process for setting up local environments to explore the Commons Impact Metrics data dumps".
- Username for contact: @mforns
- Session duration (25 or 50 min): 25
- Session type (presentation, workshop, discussion, etc.): presentation
- Language of session (English, Arabic, etc.): English
- Prerequisites (some Python, etc.): Familiarity with commons data in general would be helpful, but not required
- Any other details to share?: Commons Impact Metrics project
- Interested?
Add your username below:
Notes from the session
Commons Impact Metrics BETA Data Dumps available today
Date and time:
Relevant links
- Phabricator task: https://phabricator.wikimedia.org/T362890
- Session slides:
- (any more?)
Presenter
@mforns: Data Products team at the Wikimedia Foundation
Participants
AnotherJensen
SocialKnowledge
Notes
Long term request from the Commons community, especially GLAM
Way to measure the impact of their contributions
Evaluate what the ROI is, especially for institutions
Community built tools like BaGLAMa, GLAMorous
These tools started to fail
Value proposition: Central precomputed data for the impact metrics for media contributed to Wikimedia Commons
Goals: standard, easy to queyr, robust
Started with GLAMWiki dashboard, BaGLAMa2, GLAMorgan
https://dumps.wikimedia.org/other/commons_impact_metrics/readme.html
Category metrics
Snapshot based
Based on category
Media file metrics
Get the titles of all used images from a given category tree
Pageviews by category
Pageviews by mediafile
Edits
End goal is to have REST API
14 endpoints
Media requests vs. pageviews
considering pageviews
counts per wiki and page
filters bots and automated traffic
scroll not counted
monthly scope
Learnings
Category
started with primary categories such as musuems
parent and children, continue until the last point
Reduce the scope
category allow list
max tree depth (7)
monthly aggregation
Roadmap
Questions
Search has done something like to navigate search through categories
Question: choice of using pageviews vs. webrequests; pageviews are inflated
counts the pageviews for the whole month even if the image is added at the end of the month