Page MenuHomePhabricator

Measure how multimedia content is added to Wikipedia articles
Closed, DeclinedPublic

Description

We are looking for data that tracks how much multimedia content gets added to Wikipedia articles from different vectors (Visual Editor, direct Wikitext editing, or bots). We'd like to use the existing edit tags for each to differentiate, and create a dashboard that visualizes this data.

This will help inform decision making around what avenues to pursue for adding more media to articles in order to fulfill the requirements of the SDAW grant. Specifically, it will also help us track our progress towards the "media added to 5 million content pages" requirement in the grant.

The first grant report is due June 1 2021, so we'd like to have these measurements in advance of that.

Original Task:

From the parent task: What are the vectors for how multimedia content gets added to Wikipedia articles? Is it Visual Editor, direct Wikitext editing, or bots? We have dashboards for this information for edits in general but it's not granular enough for us to distinguish multimedia additions.

We'd also like to explore whether we can differentiate media that was uploaded as part of the edit versus media that is searched for via image search in VE. See @nettrom_WMF's comment in T266067#6634887:

There's an open question about whether we should differentiate between uploads and adding media. I think that depends on how uploads are logged by MediaWiki. If a user uploads two files through VE and adds them into an article, does that show up as two file uploads and one edit in the system? Are those uploads tagged in a way that makes them easy to separate from uploads outside of VE? Depending on how easy they are to identify, we might not need to do something specific about them.

Details

Other Assignee
nettrom_WMF

Event Timeline

Based on my conversations with @cchen and @mpopov it looks like this will not be straightforward to do any time soon. If we're interested in understanding this based on existing edits we'll need to extract and process diffs between revisions.

If the Structured Data team is interested in tracking this moving forward, implementing an edit tag for edits that add multimedia content should be considered as that allows easy visualization in Superset & Turnilo, and analysis through various tables in the Data Lake.

@Ramsey-WMF @CBogen based on the high time investment needed to get historical data we're going to decline this. If this would be important going forward, your team should discuss possible instrumentation to make it more feasible.

@kzimmerman, do you mind if I reopen? I'd like to explore adding the possible instrumentation (perhaps the edit tag as @nettrom_WMF mentioned above). I don't have a timeline on it, but I'd like to at least keep this open. Maybe you can put it in tracking like you did with T265772? Otherwise feel free to take it off your boards and we'll just keep it on ours for now.

kzimmerman moved this task from Triage to Tracking on the Product-Analytics board.

@CBogen no problem! I'm going to reassign it to you and move it to tracking.

@nettrom_WMF T266067 is now in progress and soon we will have edit tags that indicate media edits. I'd love it if we could prioritize this ticket so that we can measure the media edits that our bot partners will do once they're in production (aiming to start working in production around the end of March 2021).

@nettrom_WMF T266067 is now in progress and soon we will have edit tags that indicate media edits. I'd love it if we could prioritize this ticket so that we can measure the media edits that our bot partners will do once they're in production (aiming to start working in production around the end of March 2021).

As per my discussion today with @nettrom_WMF , this task is no longer needed for March 2021 because we can already get the data out of SuperSet. Will instead need it by end of May 2021 for the grant report.

Moving this out of Product Analytics' "tracking" column as the edit tags have been deployed. We'd now like to get some example charts in Superset based on edits_hourly.

Other than T286362, which is a bug, I would like to mention one more potential problem in the mw-*-media tags - namely, adding/removing maintenance templates, such as Template:Unreferenced, triggers the tag.
Since the template contains an image (the book icon), this is not a bug. However, it may hamper the usefulness of the tags for the purpose of this item.

nshahquinn-wmf subscribed.

Moving this out of Product Analytics' "tracking" column as the edit tags have been deployed. We'd now like to get some example charts in Superset based on edits_hourly.

Sounds like you are planning to do this 😊

nettrom_WMF changed the task status from Open to Stalled.Oct 25 2021, 4:40 PM
nettrom_WMF reassigned this task from nettrom_WMF to cchen.

@nshahquinn-wmf : Yeah but no. As far as I know the edit tags are still buggy as the child task of this is not resolved. Once/if that's resolved, Connie would be the one analyzing this 📈

Blocked on T299667. If that method works, we will try to use it for this ticket as well.

Focusing on T299667 instead