Page MenuHomePhabricator

Develop a consistent rule for which special pages count as pageviews
Open, MediumPublic

Description

Historically, there have been several approaches to including special pages as pageviews:

  • Erik Zachte's initial definition included all special pages
  • The "new" pageviews definition introduced around 2015 blacklisted a specific set of pages used for programmatic activity and "user actions" (such us "logins") ( BannerRandom, HideBanners, CentralAutoLogin, MobileEditor, Undefined, MobileMenu, BlankPage, UserLogin, ZeroRatedMobileAccess)
  • Since July 2019 (T239672), only three whitelisted special pages have been counted (Search, RecentChanges, Version)

However, none of these has been completely satisfactory, and we should come up with a consistent, stable rule. Some considerations:

  • If a special page is excluded from pageviews, that means its view counts will not be retained past 90 days or available through the pageviews API. This can hamper analysis; for example, during the Recent Changes filters project, the Collaboration team looked at pageview counts for RecentChanges.
  • In some cases, including special pages has had undesirable privacy implications (T239672) or caused spikes in pageviews unrelated to actual content consumption.
  • Some special pages are automatically called by MediaWiki when certain actions are taken (e.g. CentralAutoLogin), and these should obviously be excluded. However, the large majority are only accessed through explicit user action in order to change settings or access information.
  • Viewing a special page with information about site content like Watchlist, MediaStatistics, or Redirects doesn't seem fundamentally different from viewing an article talk page, which is counted as a pageview. Similarly with viewing special pages with user information like UserGroupRights or AutoblockList and viewing user pages.
  • add yours!

You can find a full (?) list of a special pages at Special:SpecialPages.

Event Timeline

nshahquinn-wmf added a subscriber: Nuria.

I think @Nuria is happy for Product Analytics to propose a rule here (T239672#5733146). It sounds like her preferred implementation would a whitelist (although I'm not sure how hard that preference is), so we would probably assemble that list as part of this task.

LGoto triaged this task as High priority.
LGoto edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
kzimmerman added a subscriber: MNeisler.
kzimmerman subscribed.

This will need input from multiple team members (to be resumed post-holiday); Megan is working on listing the pages she's aware of in a subtask (T241003) before going on leave.

As an arbcom, we need to see the Pageviews for Special:Contributions/USERNAME. For example, when we receive hounding claims. That's why we opened https://phabricator.wikimedia.org/T244639

kzimmerman lowered the priority of this task from High to Medium.

The scope of this is (increasingly) extensive and requires changing existing definitions; currently the pain points are not enough to make it urgent. Unassigning and moving to the backlog for now.

kzimmerman moved this task from Backlog to Icebox on the Product-Analytics board.
kzimmerman added a subscriber: Iflorez.

We discussed this again, and think it should be considered if/when we revisit how we measure pageviews. But again, it is not urgent enough to take on now. Moving this to the icebox.

Back in March I asked about an issue with us seeing views for Special pages that aren't allowlisted (as mentioned above)

@Milimetric posted in Slack:

The rule currently in existence around special pages is:
If X-Analytics has "special" in it, and the page_title is in (recentchanges, version, search), then is_pageview is true
Else, if X-Analytics has "special" in it, then is_pageview is false
Else, is_pageview is true

and then followed up with:

we figured out when this happens more precisely: when users are logged out, x_analytics header is not being set correctly

Thanks @mpopov, we're tracking that issue here: T304362