Page MenuHomePhabricator

Establish baseline for media additions
Closed, ResolvedPublic

Description

To measure whether image suggestions notifications lead to media additions, we need to establish baselines for the following metrics mentioned in the parent ticket.

  • how many images on average are added by each user in our target wikis (currently pt, ru, id)
  • how many images on average are added by each user with >500 edits in our target wikis (currently pt, ru, id)

Acceptance Criteria:

  • Media added to infoboxes are included in the total
  • All available media types are included in the total (video, audio, images, pdfs, etc)
  • Icons and other unwanted media types are filtered out of the total
  • Additions that have been reverted within 48 hours are filtered out of the total

Event Timeline

cchen triaged this task as Medium priority.
cchen created this task.
cchen moved this task from Triage to Kanban on the Product-Analytics board.
cchen edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
cchen moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Aklapper renamed this task from Establish baseline for media addtions to Establish baseline for media additions.Feb 2 2022, 10:10 AM

UPDATED: removed edits that were reverts

A first pass on calculating these baselines has now been done. The numbers and the calculations can be found in this Jupyter notebook. It uses monthly edits for December 2021 as source of the data for ptwiki and ruwiki.

For quick reference, the results are as follows:

Russian Wikipedia:

  • Average number of images added by each user in our target wikis: 2
    • Among all the editors and edits: 5.1% of edits are image edits, 23.9% of editors have image edits, 18,801 images are added.
  • Average number of images added by each user with >500 edits in our target wikis: 6
    • Among all the edits made by editors with > 500 edits: 4.8% of edits are image edits, 42.2% of editors have image edits, 15,081 images are added.

Portuguese Wikipedia:

  • Average number of images added by each user in our target wikis: 2
    • Among all the editors and edits: 6.1% of edits are image edits, 20.1% of editors have image edits, 9,718 images are added.
  • Average number of images added by each user with >500 edits in our target wikis: 7
    • Among all the edits made by editors with > 500 edits: 5.4% of edits are image edits, 48.6% of editors have image edits, 6,705 images are added.

Notes:

  • The way we looked for the media additions and calculated the metrics is from T299712. Here's a list of media file types we include:
IMAGE_EXTENSIONS: ['.jpg', '.png', '.svg', '.gif', '.jpeg', '.tif', '.bmp', '.webp', '.xcf', '.pdf']
VIDEO_EXTENSIONS = ['.ogv', '.webm', '.mpg', '.mpeg']
AUDIO_EXTENSIONS = ['.ogg', '.mp3', '.mid', '.webm', '.flac', '.wav', '.oga']
  • We only count the users who made media additions that have not been reverted within 48 hours. (let me know if we want to include all the media edits user)

Will run one more time before releasing.

Thanks for this analysis @cchen and linking to the notebook! Just wanted to point out a small piece of context for others that those median statistics are for just users who added at least one piece of media to an article that month. If you include all the folks who didn't add any media, I assume the median values are 0 (because most users never edit media in their editing work)

Thanks for this analysis @cchen and linking to the notebook! Just wanted to point out a small piece of context for others that those median statistics are for just users who added at least one piece of media to an article that month. If you include all the folks who didn't add any media, I assume the median values are 0 (because most users never edit media in their editing work)

So (unless we target only those users who were already adding media) we even could end up seeing a drop of "median # media added" if we manage to engage a lot of users who did not upload before (and the majority of them doesn't upload more than the current median), right?

Thanks for this analysis @cchen and linking to the notebook! Just wanted to point out a small piece of context for others that those median statistics are for just users who added at least one piece of media to an article that month. If you include all the folks who didn't add any media, I assume the median values are 0 (because most users never edit media in their editing work)

Thanks for pointing this out!
@matthiasmullie yes, there's possible that the median # media drop. In this case, it makes more sense to measure total # of media added, % of media editors out of all editors, and % of media edits out of all edits.

UPDATES: add idwiki; remove edits that were reverts;

The baselines in May 2022 for ruwiki and ptwiki are updated. The numbers and the calculations can be found in this Jupyter notebook.

For quick reference, the results are as follows:

Russian Wikipedia:

  • Average number of images added by each user: 3
    • Among all the editors and edits: 5.6% of edits are image edits, 23.9% of editors have image edits, 23,093 images are added.
    • 1.1% of image edits used visual editor, 4,172 images add through visual editor.
  • Average number of images added by each user with >500 edits: 8
    • Among all the edits made by editors with > 500 edits: 5.2% of edits are image edits, 42.2% of editors have image edits, 18,897 images are added.
    • 0.8% of image edits used visual editor, 2,322 images add through visual editor.

Portuguese Wikipedia:

  • Average number of images added by each user: 2
    • Among all the editors and edits: 6.7% of edits are image edits, 22.3% of editors have image edits, 10,805 images are added.
    • 2.1% of image edits used visual editor, 3,544 images add through visual editor.
    • 195 images added through newcomer task.
  • Average number of images added by each user with >500 edits: 8
    • Among all the edits made by editors with > 500 edits: 5.7% of edits are image edits, 48.5% of editors have image edits, 7,230 images are added.
    • 1.1% of image edits used visual editor, 1,499 images add through visual editor.
    • 44 images added through newcomer task.

Indonesian Wikipedia:

  • Average number of images added by each user: 3
    • Among all the editors and edits: 8.5% of edits are image edits, 28.9% of editors have image edits, 6,296 images are added.
    • 1.9% of image edits used visual editor, 1,947 images add through visual editor.
  • Average number of images added by each user with >500 edits: 11
    • Among all the edits made by editors with > 500 edits: 7.7% of edits are image edits, 57.6% of editors have image edits, 4,036 images are added.
    • 1.1% of image edits used visual editor, 784 images add through visual editor.

Thanks for this analysis @cchen! I'm sorry that I forgot to add our third target wiki to the description (id), but I've now updated it. Can we run it for idwiki as well?

Thanks for this analysis @cchen! I'm sorry that I forgot to add our third target wiki to the description (id), but I've now updated it. Can we run it for idwiki as well?

Sure! i will add idwiki.

@cchen a thought: you might want to filter out edits that were reverts (WHERE NOT revision_is_identity_revert) because any image edits in them are incidental. hopefully won't change the results much because things like section blanking are often blocked by AbuseFilter for new editors (enwiki example) but still probably a cleaner baseline.

@cchen a thought: you might want to filter out edits that were reverts (WHERE NOT revision_is_identity_revert) because any image edits in them are incidental. hopefully won't change the results much because things like section blanking are often blocked by AbuseFilter for new editors (enwiki example) but still probably a cleaner baseline.

@Isaac thanks for pointing this out. i reran the number excluding the reverts, and over 10% of image edits are edits that were reverts. I looked the edit comment, looks like most of them are "undo" and "rollback". It makes sense to exclude these edits.

i reran the number excluding the reverts, and over 10% of image edits are edits that were reverts. I looked the edit comment, looks like most of them are "undo" and "rollback". It makes sense to exclude these edits.

Cool - thanks for the update! So not huge but also definitely significant.

Added idwiki data; and removed edits that were reverts.

Thanks @cchen! I'm going to close this ticket now that we have baselines, and we can continue with analysis of the new feature in T292316. Please re-open if there's ongoing work here that I missed!