Page MenuHomePhabricator

Measuring reader visits to core project pages (on Vector 2022)
Closed, DeclinedPublic

Description

Similar to T327440, but focused on accessibility / visibility / use of core project pages:

On most projects, readers no longer see the main sidebar, including core pages that are a fun or instructive introduction to the site (e.g., on en:wp, Help/Learn to edit, Random Page, and Recent Changes). How does this impact how often logged-out readers visit those pages?

It would be good to see stats on the # of daily visits by logged-out readers to:

  • Contents, Current events, Random, About, Donate
  • Help, Learn to edit, Community portal, Recent Changes, Related Changes
  • Download as PDF, Printable version

(excluding spiders, bots, &c)

Event Timeline

Jdlrobson added a project: Product-Analytics.
Jdlrobson subscribed.

Tagging product analytics since this is a community request for information.

Per the web team's quarterly grooming, these tasks are being removed from the team's backlog.

Thanks @Jdlrobson would love to see a quick analytics eval (or just an explanation of how I could find that data myself)

@Jdlrobson When a logged-out user (i.e., not authenticated) visits any of the target pages (e.g., Contents, Current Events, Random, About, Donate, Help, Learn to Edit, Community Portal, Recent Changes, Related Changes, Download as PDF, Printable Version), we increment a counter for that URL.
We store this count temporarily for each day and reset it at the start of the next day.

Is this the expected approach for addressing this issue?
If so I would appreciate your guidance!

Hi SJ!

Some of what you want might be available in the Analytics API, and maybe even in the Pageviews tool? E.g. https://pageviews.wmcloud.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-20&pages=Special:RecentChanges

See also https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Traffic/Pageviews

However, often raw reader visit metrics on their own have PII, so we can't expose them as public datasets. The public aggregated pageview metrics are are identified by a centralized 'needle in a haystack' algorithm. The entire firehose of all webrequest (upwards of 200K per second) is filtered and analyzed to count something as pageview.

This means that visits to pages that are not traditionally counted as 'pageviews' are not counted or available for public querying.

Related: T371321: [Idea] Collect pageview data using client-side instrumentation

IMO: counting something as 'visit' or 'pageview' should be instrumented by the thing being visited! So, if we want to count visits to some of the pages you mentioned, then MediaWiki should count it somehow.

For internal WMF usages, ideally this would be as a pageview event, probably via https://wikitech.wikimedia.org/wiki/Metrics_Platform.

Ideally we'd have an automated way to make these kinds of metrics available publicly too.

We don't have this currently, so instead MediaWiki could instrument this via Statslib, and then the metrics would be automatically available in Prometheus & Grafana for public dashboarding.

Hi! Product Analytics team manager here.

@Jdlrobson When a logged-out user (i.e., not authenticated) visits any of the target pages (e.g., Contents, Current Events, Random, About, Donate, Help, Learn to Edit, Community Portal, Recent Changes, Related Changes, Download as PDF, Printable Version), we increment a counter for that URL.
We store this count temporarily for each day and reset it at the start of the next day.

Can you please clarify who "we" is? That is, who owns that instrumentation? And what happens to that counter – is it sent anywhere (and thus would need to follow the data collection guidelines) or does it remain in the browser's storage? (And does that negatively affect performance?)

Hi @Sj ! I'm declining the analysis request on behalf of Product Analytics, as the work for that team is set based on priorities from Product Managers who work more directly with the community, and this work was not prioritized.

The Research team has public office hours if you would like to consult them on measurement and using publicly available datasets (such as the ones @Ottomata suggested). Information about those office hours and how to book them is available here: https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours#Schedule. (The office hours are for brainstorming and consultation only; the Research team does not offer on-demand analyses unless it is explicitly prioritized as part of the team's work.)

The pageviews tool does not separate out logged-in vs logged-out users, but it is helpful to get a general overview of traffic.

@Ottomata thank you kindly, I could just have looked at Pageviews. I wasn't sure how to refine that, presumably it's almost all logged-out but includes some spiders?
Some special pages / views like DownloadAsPdf don't exist in that dataset.

Looks like there was a roughly 50% drop in visits to some pages (Contents, Community Portal, File upload) when the skin changed over, while those two Help: page views weren't affected at all.

Screen Shot 2025-03-01 at 14.48.00.png (1×2 px, 2 MB)

@kzimmerman thanks. I prefer public threads to office hours because they leave behind information for others with the same question.

(By the way, This was not originally meant as a request for on-demand analysis, but a FR for a way to understand the activity of logged-out readers, when we were all navigating decisions about how our work is presented to them. As editors have put a lot of time into pages we chose to go in the sidebar, it's worth some quantified understanding of how default skin decisions affect how many people end up seeing those pages -- that helps inform what editors work on or advocate for, incl. when talking to the PMs you reference.)