Page MenuHomePhabricator

Dashboard of multimedia usage on the Wikipedias
Closed, ResolvedPublic

Description

The Structured Data team is interested in creating a dashboard of multimedia usage on the Wikipedias. This will inform the work that the Structured Data team is doing on media matching starting in Q3 of 2020-2021 (as well as the work that the Android and Growth teams are doing this fiscal year on the same.) The data will assist with further exploration of a feature that would suggest multimedia files based on topics identified in an article.


We'd like to answer the following questions:

  1. What are the vectors for how multimedia content gets added to Wikipedia articles? Is it Visual Editor, direct Wikitext editing, or bots? This will help us understand where the most impactful places will be to target our work on media matching. We have dashboards for this information for edits in general but it's not granular enough for us to distinguish multimedia additions. Subtask: T265771
  1. Commons isn't the only place where WMF projects host multimedia files. Many of the Wikipedias host their own files too, generally for Fair Use purposes (English Wikipedia alone has almost 890,000 files). We'd love a data view that allows us to compare usage of those "off-Commons" files vs. on Commons, per wiki. This will help us understand whether focusing on Commons files is sufficient, or if we need to expand. Subtask: T265768
  1. We're curious about whether we could measure dwell time (and link clicking within an article) and plot it against the number and type of media file, and maybe control for article length - e.g. so we can determine that a person is x% more likely to click links or files in an article if it has >3 images. This would help us understand how many images is the ideal number to be added to an article. Subtask: T265772
  1. Additionally, we're very curious if there's available data about play rates for audio and video files. This will help us understand how valuable these files are on the Wikipedias and whether to put focus on them. Subtask: T265773
  1. And finally, last month, the tech department did some analysis on Wikipedia content and announced this: "By analyzing webrequest logs for english, spanish, arabic and french Wikipedia, we found that when readers visit a page on Wikipedia, around 3% of the time they also click on an image." We would love to see a consolidated dashboard of image clicks on a per wiki basis over time. Subtask: T265774

We've also brought this up to the team working on the Content Dataset, but we are not sure if these requests will get prioritized in that project or what the timeline will be.

Event Timeline

Hi @CBogen! We'll review and triage this at the next board review meeting (Tuesday, September 1st).

kzimmerman moved this task from Triage to Needs Investigation on the Product-Analytics board.
kzimmerman subscribed.

@CBogen This is a big request, and we'll need to follow up with your team on how to break it down and which parts will be included in the Content Data work vs. the work Morten does with Structured Data.

I'm going to assign this to Morten to follow up on potential next steps with both Structured Data & Product Analytics when he returns from vacation, and moving this task to "Needs Investigation" for now.

LGoto triaged this task as Medium priority.Sep 8 2020, 5:09 PM

@nettrom_WMF to split this up into subtasks based on his conversation with SD last week

nettrom_WMF added a project: Epic.
nettrom_WMF moved this task from Needs Investigation to Epics on the Product-Analytics board.

Created subtasks for all five points, changing this to an epic and moving it to the Epics column on the Product Analytics board.