Page MenuHomePhabricator

Data for identifying users on Commons for research
Closed, ResolvedPublic

Description

We want to target new active users on Commons from the Spanish community who are using the upload wizard on Commons to upload images. But this is still quite broad and there are some open questions that we need to answer in order to further refine the audience. So here is a list to answer:

  • Where are the majority of uploads (excluding bulk uploads) coming from? Is it upload wizard on Commons or other tools? Is it from desktop or mobile?
  • How do we define new users on commons? On Wikipedia it is typically someone who joined a month ago and made edit within a month. Is that true for commons users? We want to focus on users who did "upload edit" and not any "content edit."
  • Between Spanish and Arabic communities, which one is more active on Commons? (Optional - we don't know if this data is easily available))

Event Timeline

cchen triaged this task as Medium priority.May 24 2023, 4:40 PM
cchen created this task.
cchen edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

Where are the majority of uploads coming from?
The Commons revision data we used here is from April 2023, excluding bots and bulk uploads.

  • Most uploads are from desktops: 4.4% are from mobile web/app, and 95.6% are from desktops.
  • Of all the uploads, 53.5% (318K out of 594K) are using the upload wizard.

How do we define new users on Commons?
In April 2023, ~10.9k editors made at least 1 upload within 7 days of registration, and ~12k editors made at least 1 upload within 30 days of registration.
From the data, we can define new users as users who joined Wikipedia 30 days ago and made at least 1 upload within 30 days.

Questions:

For new users, it might also make sense to include users who joined Wikipedia a long time but just started to edit Commons 30 days ago, and made at least 1 upload within 30 days.
Another question is, should we look at new users who are more active? for example, new editors who upload made at least 5 uploads within 30 days.

Thanks @cchen!

So it's safe to say we with around half of uploads done with upload wizard, our APP hypothesis of improving it has grounds. And with 90%+ traffic coming from desktop, we should focus on desktop. @Sneha

@cchen, re questions:

For new users, it might also make sense to include users who joined Wikipedia a long time but just started to edit Commons 30 days ago, and made at least 1 upload within 30 days.

Yes, I think we should consider such users new as well, as we look specifically at when people start contributing to Commons project. So a new Commons user would be a user, new or old to the wiki ecosystem, who made at least 1 upload to Commons (through upload wizard, vizual editor or any other way) within 30 days. @Sneha what do you think?

Another question is, should we look at new users who are more active? for example, new editors who upload made at least 5 uploads within 30 days.

It would be interesting to see the % of users who are right away active. My hypothesis is that it's somewhere around 10%. But I don't know if that would help us with defining user audience?

Between Spanish and Arabic communities, which one is more active on Commons?
There's no direct way to determine what language users are using while doing the uploads. Per discussion with Sneha, I took a look at the number of files with captions in Spanish and Arabic. In Commons, there are 373K files with Spanish captions and 42K files with Arabic captions.
I also compared user counts. For those who made uploads in April, ~27K of them have accounts in Spanish language projects and 8.7K of them have accounts in Arabic language projects.
Spanish has a larger user and content basis on Commons in this case.

@AUgolnikova-WMF I added the thoughts in this comment and let me know if it makes sense.