Page MenuHomePhabricator

End-of-year pageview statistics for 2022
Closed, ResolvedPublic

Description

Per email from @EdErhart-WMF, it's that time of year where we'd like pageview statistics to be used in the annual most-poplar English Wikipedia articles of the year blog post.

As we did in 2021, they are asking for Product Analytics to grab this data twice:

  • At the end of November/beginning of December, which allows us to get an idea of what the list will look like + pre-write much of the post
  • In the middle of December (the 15th?) so we have the most up-to-date numbers possible ahead of publishing

The phab task for the 2021 edition of this was T295943, and the code used for gathering that data is in this Python notebook in GitHub.

Event Timeline

mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
mpopov added a subscriber: kzimmerman.

@kzimmerman and I will discuss this sometime this week after reviewing the team members' bandwidth and availability.

mpopov triaged this task as High priority.Nov 29 2022, 5:18 PM
mpopov added subscribers: SNowick_WMF, mpopov.

@SNowick_WMF will be working on this and can do the first data pull this week. Thank you, Shay!

I have pulled data from the most recent snapshot (2022-10) but this will be most up-to-date when I run it after the snapshot for 2022-11 is ready (usually done by the 5th day of the next month so 2022-12-05). I will set a reminder to re-run and replace this data so please consider this a temporary list. Will email @EdErhart-WMF directly when this is ready.

Data

I wanted to verify the results I got so I pulled together data from Topviews - collected monthly Data through 2022-10. Will update this early December as well.

Thanks so much, @SNowick_WMF! I know that last year we removed articles like YouTube, Google, and Wikipedia that had no referral data to back up their pageview count. Can you confirm that that's still the case? And would you agree that we should remove Bible, Skathi (moon), and Chief executive officer as false positives?

Hi @EdErhart-WMF - I did not filter for those values, I will look over last year's list again to validate but I do agree we should leave those out. I can filter them in my final results when I re-pull data next week. I'll document what I leave out so we have that recorded but of course if there's anything else you want to take out please do so.

@SNowick_WMF That sounds great! Would you also be able to show the mobile view percentages for each article in the table? We've previously used that as another way of screening for false positives. Never mind, you've already done that and I forgot over the weekend. :-)

Thanks again—I appreciate your help!

Hi @EdErhart-WMF - I've updated pageview data to include 2022-11 results - 2022 Pageviews. I noted a few items we may want to consider removing on this sheet and a separate sheet indicating what I took out.

I also updated Topviews data to include 2022-11 results.

Thanks, @SNowick_WMF! This is great stuff. I'm fascinated that Deaths in 2022 hasn't yet passed QEII. Some other notes:

  1. This is more of a note for myself next year than an action item, but this list makes clear that the old rule of thumb for removing articles—under 10% or over 90% mobile views—needs to be tweaked. Dahmer + KGF + a few others get awfully close to 90%, and ICC Men's T20 World Cup actually breaks it.
  2. I think we'll remove Cleopatra as well, given the mobile percentage/referral data and this piece from Input Mag.

In case you have plans to share the longer list, I suspect a few more could be removed in addition to the highlighted ones you've identified.

  1. Anything with XXX, which always get a ton of views without explanation (the desktop/mobile split makes it clear it's something automated). XNXX too, for the same reason.
  2. Ansel Adams, Susan Wojcicki, and F5, Inc.? Their traffic has over 95% no referrals, which is weird.
  3. IOS has a very low mobile view percentage.

@SNowick_WMF In talking within Comms, our press-focused folks believe that we'd get more attention if we published the list earlier in December. (It gives us more runway before the holidays, particularly as the 23rd falls on a Saturday this year.) Would it be possible to do the final data pull a bit earlier, on 12 or 13 Dec? Whichever works best in your schedule, although please feel free to push back if it's not possible.

@EdErhart-WMF I can re-pull the data again on 2022-12-12, will post latest results here. I'll make the same omissions as the last version and articles you mention in your last comment for that version with the understanding that Comms should have final say in what we include.

2022-12-12 Data is ready. I added a note regarding the 2 FIFA results in the Top 25, if those pageviews were combined it would make FIFA the top viewed page for 2022. They are clearly 2 separate pages FIFA World Cup and
2022 FIFA World Cup but the topic is the same.

Thanks Shay! I've edited our blog draft to call that out. Do you want to have a look at that draft?

Sure @EdErhart-WMF if you want to send me a link on Slack I will go over let you know if I have any feedback.