Data Platform Request Form
Is this a request for a:
- Data Pipeline
- Data Feature
Please provide the description of your request:
A dataset that can be used to determine how many and which wikis exist - and whether they're active - over time. The use cases I'm aware of have centered around Wikipedias, although I can imagine questions around other wikis as well.
Currently, if we need to know the number of active Wikipedias, we reference this data page: https://commons.wikimedia.org/wiki/Data:Wikipedia_statistics/meta.tab . However, that only goes back to 2020. The https://meta.wikimedia.org/wiki/List_of_Wikipedias page has more history, but getting historical data depends on going through edit revisions - and the page did not always differentiate between active and non-active languages.
Use Case: (Please explain what this feature will be used for):
External communications about the scale and scope about our work. The most recent use case is for a blog post being prepared for Diff - a technology spotlight about how we keep Wikimedia running (https://phabricator.wikimedia.org/T323230). We were asked, "can we get data for how the number of active wikipedias has changed over 10+ years?"
Based on the timeline and my team's bandwidth, I suggested focusing on our current number of languages. But it would be useful to have this data readily available for future public communications.
Ideal Delivery Date:
- Provide link to CSV/GSheet example data. Link: ____
- Provide link to the desired Table Schema. Link: ____
- Does this data contain anything that is sensitive, PII or Private?
- I don't know
- Who will own the data (Fix issues, update descriptions & metadata etc.)?
- Do you have the transformation you like to be applied Link: ____
- Does this data need to be linked to other data in the Data Lake?
Data Feature Checklist
Please link to the following if applicable.
|Related PHAB Tickets||Yes||<add link here>|
|Product One Pager||Yes||<add link here>|
|Product Requirements Document (PRD)||Yes||<add link here>|
|Product Roadmap||No||<add link here>|
|Product Planning/Business Case||No||<add link here>|
|Product Brief||No||<add link here>|
|Other Links||No||<add links here>|
For Data Engineering Team to fill out:
|Will this improve the efficiency of a teams workflow?||1-3|
|Does this have an effect of our Core Metrics?||1-3|
|Does this align with our strategic goals?||1-3|
|Is this a blocker for another team?||1-3|