The Structured Data team is interested in creating a dashboard of multimedia usage on the Wikipedias. This will inform the work that the Structured Data team is doing on media matching starting in Q3 of 2020-2021 (as well as the work that the Android and Growth teams are doing this fiscal year on the same.) The data will assist with further exploration of a feature that would suggest multimedia files based on topics identified in an article.
We'd like to answer the following questions:
- What are the vectors for how multimedia content gets added to Wikipedia articles? Is it Visual Editor, direct Wikitext editing, or bots? This will help us understand where the most impactful places will be to target our work on media matching. We have dashboards for this information for edits in general but it's not granular enough for us to distinguish multimedia additions. Subtask: T265771
- Commons isn't the only place where WMF projects host multimedia files. Many of the Wikipedias host their own files too, generally for Fair Use purposes (English Wikipedia alone has almost 890,000 files). We'd love a data view that allows us to compare usage of those "off-Commons" files vs. on Commons, per wiki. This will help us understand whether focusing on Commons files is sufficient, or if we need to expand. Subtask: T265768
- We're curious about whether we could measure dwell time (and link clicking within an article) and plot it against the number and type of media file, and maybe control for article length - e.g. so we can determine that a person is x% more likely to click links or files in an article if it has >3 images. This would help us understand how many images is the ideal number to be added to an article. Subtask: T265772
- Additionally, we're very curious if there's available data about play rates for audio and video files. This will help us understand how valuable these files are on the Wikipedias and whether to put focus on them. Subtask: T265773
- And finally, last month, the tech department did some analysis on Wikipedia content and announced this: "By analyzing webrequest logs for english, spanish, arabic and french Wikipedia, we found that when readers visit a page on Wikipedia, around 3% of the time they also click on an image." We would love to see a consolidated dashboard of image clicks on a per wiki basis over time. Subtask: T265774
We've also brought this up to the team working on the Content Dataset, but we are not sure if these requests will get prioritized in that project or what the timeline will be.