Data Platform Request Form
Is this a request for a:
- Dataset
- Data Pipeline
- Data Feature
Please provide the description of your request:
A dataset that can be used to determine how many and which wikis exist - and whether they're active - over time. The use cases I'm aware of have centered around Wikipedias, although I can imagine questions around other wikis as well.
Currently, if we need to know the number of active Wikipedias, we reference this data page: https://commons.wikimedia.org/wiki/Data:Wikipedia_statistics/meta.tab . However, that only goes back to 2020. The https://meta.wikimedia.org/wiki/List_of_Wikipedias page has more history, but getting historical data depends on going through edit revisions - and the page did not always differentiate between active and non-active languages.
Use Case: (Please explain what this feature will be used for):
External communications about the scale and scope about our work. The most recent use case is for a blog post being prepared for Diff - a technology spotlight about how we keep Wikimedia running (https://phabricator.wikimedia.org/T323230). We were asked, "can we get data for how the number of active wikipedias has changed over 10+ years?"
Based on the timeline and my team's bandwidth, I suggested focusing on our current number of languages. But it would be useful to have this data readily available for future public communications.
Priority:
Low
Ideal Delivery Date:
Dataset Checklist
- Provide link to CSV/GSheet example data. Link: ____
- Provide link to the desired Table Schema. Link: ____
- Does this data contain anything that is sensitive, PII or Private?
- Yes
- No
- I don't know
- Who will own the data (Fix issues, update descriptions & metadata etc.)?
Datapipeline Checklist
- Do you have the transformation you like to be applied Link: ____
- Does this data need to be linked to other data in the Data Lake?
- Yes
- No
Data Feature Checklist
Please link to the following if applicable.
Document Type | Required? | Document/Link |
Related PHAB Tickets | Yes | <add link here> |
Product One Pager | Yes | <add link here> |
Product Requirements Document (PRD) | Yes | <add link here> |
Product Roadmap | No | <add link here> |
Product Planning/Business Case | No | <add link here> |
Product Brief | No | <add link here> |
Other Links | No | <add links here> |
For Data Engineering Team to fill out:
Value Calculator | Rank |
---|---|
Will this improve the efficiency of a teams workflow? | 1-3 |
Does this have an effect of our Core Metrics? | 1-3 |
Does this align with our strategic goals? | 1-3 |
Is this a blocker for another team? | 1-3 |