As a product manager, I need a dashboard where I can see the growth of structured data over time so that I can report on the success of the project and know what improvements to prioritize.
The success of Structured Data on Commons, and information about its weakness and areas of biggest growth, will inform the SD team's decisions and as we expand Structured Data from Commons to the Wikipedias. We want to ensure that we learn and improve as we dramatically expand the use of Structured Data across our projects. Additionally, the Image Recommendations project depends on the richness of Structured Data on Commons to provide high quality image matches for articles, and this type of data will help us learn where we need to invest in improving SDC for that purpose.
The goal of this ticket is to create a regularly updated visual dashboard (in superset or something like it?) where I can easily see the number of files with at least one structured data element, as well as the other data in the notebooks listed below, e.g.: Media files containing structured fields in non-English languages; Number of files with non-English captions; Number of files with English captions; Number of files that had captions added; How quickly after creation does non-English captions get added?; and Time to edit for SDC; Comparison of number of SDC and non-SDC edits. dashboard that showed the growth of structured data on Commons over time.
This could be based on the analytics work that was done for the grant:
https://github.com/wikimedia-research/SDC-metrics-2019/blob/master/T231952-part-1.ipynb
https://github.com/wikimedia-research/SDC-metrics-2019/blob/master/T231952-part-2.ipynb
https://github.com/wikimedia-research/SDC-metrics-2019/blob/master/T231952-part-3.ipynb
(DRAFT) dashboard in Superset: https://superset.wikimedia.org/superset/dashboard/310/
The metrics include:
- Overview
- Number (%) of files with at least one structured data element
- Median number of structured data elements per file
- Number of files with license
- Number of files with depicts
- Number of files with captions (en vs. non-en vs. both)
- Captions
- Number of files with captions added monthly
- Number of files with non-English/English captions added monthly
- Number of files with both captions added monthly
- How quickly after creation do non-English captions get added?
- SDC edits
- Number of SDC and non-SDC edits monthly
- Time to edit for SDC