Page MenuHomePhabricator

[MI 3] Investigate trends in movement metrics
Open, HighPublic

Description

This pillar of work for the Movement Insights team has the following mission:

Foundation staff gain deeper insight into important trends in high-level metrics, informing their strategic thinking. These insights are documented in a way accessible to non-data specialists and disseminated widely. The primary audience is Foundation staff, but the results will be made public and disseminated to other Wikimedians as secondary priority.

Key documents

Related Objects

Event Timeline

nshahquinn-wmf triaged this task as High priority.

I've been doing some initial exploratory work:

  • conversations with members of Movement Insights and Morten Warncke-Wang
  • looking through recent movement metrics output and brainstorming ideas
  • scheduling stakeholder conversations (I have six scheduled for the coming week)

Weekly updates:

  • Had exploratory conversations with Irene Florez, Leila Zia, Kate Zimmerman, Maryana Pinchuk, Sam Patton, Jaime Anstee, and Zack McCune

This week, I:

  • Had exploratory conversations with Jan Eissfeldt, Becky Maung, Isaac Johnson, Nino Hemmer, Kinneret Gordon, Marshall Miller, and Sonja Perry

This week, I:

  • Tentatively chose the first two topics: Latin America unique device decline and new registration decline (T369327)
  • Had an exploratory conversation with Erica Roden

This week, I:

  • Discussed possible trend investigation projects with Pablo Aragón, Maryana Pinchuk, Erica Roden, and Eli Asikin-Garmager
  • Chose the registration decline as the topic of my first trend investigation

@nshahquinn-wmf - I saw your other updates on technical work (wmfdata, feedback, and code reviews). Let's chat next week in our 1:1 about how to unblock this pillar's investigation work.

This week, I:

I will start work on this project in earnest on Fri, 22 Nov, after the upcoming offsite and time off.

nshahquinn-wmf renamed this task from [MI 3] Investigate trends in movement metrics to [MI-3] Investigate trends in movement metrics.Nov 13 2024, 3:04 PM

Last week, I:

For more details, see the project document [WMF only].

This week, I:

  • processed and responded to lots of ideas and feedback
  • investigated how restrictions on registration are implemented (e.g. TitleBlacklist, AbuseFilter)
  • revised my hypotheses (see project document for the latest)
  • pulled initial data from mediawiki_history and started to compare the global and wiki-specific trends

This week, I:

  • analyzed the the time trend in registrations, globally and at selected wikis (see the attached graphs for an example)

global_weekly_registrations_lowess.png (1×2 px, 211 KB)

top_10_wikis_weekly_registrations_lowess.png (3×2 px, 972 KB)

This week, I:

  • dug into serversideaccountcreation data and analyzed trends when slicing by both wiki and interface (notebook)

Last week, I:

  • Analyzed registrations with confirmed email addresses
  • Analyzed welcome survey responses

I am now working on writing up the report since I am finished with the data analysis and visualization (unless I run into any burning questions as I write).

This week, I:

  • Began writing the registration decline report (learning Quarto in the process)
  • Analyzed and visualized the correlation between registrations, new active editors, and IP edits
  • Added annotations about known configuration and interface changes to global and wiki-specific registration graphs

I'm continuing to work on writing the report and expect to finish on Tuesday next week.

This week, I:

  • Kicked off work on the readership–revenue investigation (T384762)
  • Responded to comments about the registration decline report
  • Made minor updates to the registration decline report

This week, I:

  • Met with Joseph from Fundraising Analytics and Elliot from Fundraising Tech to understand their data
  • Worked with readership–revenue stakeholders to clarify needs and desired outputs
  • Made and followed up on requests for Google Search Console access and ComScore data

This week, I:

  • Got access to Google Search Console data
  • Discussed ComScore data with Nino Hemmer
  • Discussed fundraising data and project goals with Joseph Mando and Runjini Murthy
  • Continued to clarify research questions and to brainstorm approaches for an impression-prediction model

This week, I:

  • Met with Joseph and Runjini to clarify their revenue prediction workflows and learn where my model should fit in
  • Learned more about and got access to ComScore data
  • Got access to banner impressions data (thanks to Joseph!)
  • Continued to iterate on a plan for the impression-prediction model, with very helpful input from Mikhail Popov and Rae Adimer

This week, I:

  • Drafted and shared a solid project plan for the banner fundraising prediction investigation

Here's a copy of the plan and schedule from the project doc:

Research goals

  • RG1: Make an educated guess at the true trend in human traffic by comparing different traffic metrics
    • This will provide Finance with general evidence about what to expect in the next 2-3 years and what traffic metrics to keep an eye on in the future
    • Candidates: user page views, page previews (desktop only), ComScore views (US only), Google Search Console clickthroughs (only 16 months of data)
    • I will write up a proper report about this, since the results will be of wide interest
  • RG2: Identify the traffic metric most closely correlated with impressions
    • Candidates: user page views, page previews (desktop only)
      • ComScore and Google Search Console data are useful as evidence of general trends, but don’t have enough coverage for predicting impressions
    • The chosen metric won’t necessarily be the most accurate metric, because impressions itself is not perfectly accurate
  • RG3: develop a model that forecasts the chosen traffic metric for each fundraising country
    • Fundraising can use the predicted change in this metric in a given country/country group (e.g. EN6C) as the predicted change in impressions for the next campaign there
    • This avoids the complexity of trying to *also* make a specific model for impressions which incorporates all of the factors that modulate the connection between pageviews and impressions (campaign length, traffic percentage, impression cap, and banner count resets). Notably, unlike the underlying traffic, these are all factors that Fundraising controls and has a strong understanding of, so it’s not particularly useful to model them.
    • The deliverable will be the model code; I won’t write a report

Schedule

phaseprojected finish date
planning and data collectionThu, 20 Feb
traffic metric comparison (RG1)Fri, 28 Feb
traffic metric comparison write-up (RG1)Wed, 5 Mar
traffic forecasting (RG2 and RG3)Fri, 14 Mar

Last week, I:

  • Kicked off the editor retention metric investigation (T392302)
  • Presented the page view forecasts to a broad audience from Fundraising

This week, I:

  • Developed a distinction between two basic types of retention metrics (“repeated action” and “group membership”)
  • Discussed the two basic types and the contributors metric model with Jaime and Hamid
  • Tried to re-run the fundraising page view forecasts and investigated a blocking bug that appeared
  • Planned a meeting with Fundraising and Finance stakeholders to explore potential work related to movement metrics

This week, I:

  • Chose and wrote up 2 straw dog metrics (second-month active editor retention and second-week edit retention)
  • Identified key decision points for metric construction
  • Had a lively check-in meeting with key stakeholders
  • Decided that the metric should be a count of retained editors rather than a retention rate
  • Planned the metric prototyping with Hamid

Last week, I:

  • Calculated and analyzed the second-week edit repeaters candidate metric

This week, I:

  • Calculated the full history of the second-month retained active editors candidate metric and compared it with active editors
  • Had a second check-in meeting and several 1:1s discussions with contributors
  • Facilitated a decision to move forward with second-month retained active editors
  • Facilitated a follow-up decision to lower the qualification to 1 edit, resulting in second-month active editors
  • Calculated and compared second-month retained editors

This week, I:

  • Drafted the decision brief (WMF only) and shared it with project contributors for feedback
  • Rewrote queries for the prototype metrics to group data based on the end of the retention window (which will be helpful for the expanded retention framework)
  • Created the first set of polished charts for the decision brief

Last week, I:

  • Wrote the decision brief section on segmentations and indicator metrics
  • Finished creating polished graphs for the decision brief

The brief is now under review by the decision-makers.

This week, I:

  • Responded to lots of comments on the brief

Since the feedback on the recommendation has been generally positive, my work here seems to be done and I expect to close this task next week.

This week, I:

  • Regenerated the forecast models to address a dependency problem
  • Updated the forecasts based the most recent data
  • Made the raw history and forecast data available as a TSV file (T392395)

This week, I:

  • Did initial first-half planning for this pillar

This week, I:

  • Met with Runjini to discuss how she’ll use the forecasts and how I can improve the deliverables
  • Added the monthly page view total to the raw data to complement daily averages (which are good for forecasting but can be confusing when compared with some other data sources)
  • Set up a Google sheet which provides:
    • Nice formatting and filtering of the raw data
    • Page views as a percent of the last campaign month using native formulas, so the last campaign month can be changed without having to re-run the notebook
  • Updated the forecasts with June data
  • Added to the forecast notebook an introduction with some info on the methodology, last-update details, and key links (including the Google sheet)

This week, I:

Last week, I:

  • Finished updating the brief in response to Kate and Marshall’s concerns around indicator metrics, segmentation, and the retention window
  • Met with Kate and Marshall for a detailed discussion of the responses, which went very well.

I think we're on track to have the decision approved by the end of this week!

This week, I:

  • Did (hopefully) final cleanup and tweaks on the decision brief
  • Had another review meeting with Kate and Marshall

Kate has signed off, but Marshall will review it with Selena before he does so. That puts us on track for approval in the next week or two.

This week, I:

  • Met with Runjini from Fundraising to work on incorporating the pageview forecasts into her revenue forecasts
nshahquinn-wmf renamed this task from [MI-3] Investigate trends in movement metrics to [MI 3] Investigate trends in movement metrics.Aug 23 2025, 10:33 PM
nshahquinn-wmf updated the task description. (Show Details)

This week, I:

  • Had exploratory conversations with Sonja Perry, Nino Hemmer, and Maryana Pinchuk
  • Chose pageview and external referral trends as the topic for the next investigation

Last week, I:

  • Drafted initial investigation goals and plan
  • Did background reading about third-party traffic data, the effect of chatbots on referrer traffic, and market research findings

Last week, I:

  • Worked on getting access to data (e.g. Google Search Console data in BigQuery, latest Comscore data) and connecting with people with relevant expertise (Isaac in Research, Maryana in Future Audiences, Nino in Comms)
  • Collected sources and did background reading
  • Worked on scoping the investigation and figuring out what the output should be

Last week, I:

  • Continued pursuing access to SimilarWeb data and Google Search Console data in BigQuery
  • Started analyzing Comscore data
  • Dug through instrumentation data streams in search of one that can proxy for mobile web pageviews
  • Got a private GitLab repo to store the analysis (T404533)

Since my last update, I:

  • Came up with a shortlist of instrumentation data streams I can use
  • Started analysis template for Google Search Console data
  • Evaluated a sample of SimilarWeb traffic data shared by Nino in Comms
  • Got access to Google Search Console data in BigQuery and figured out it doesn’t go back far enough to be useful
  • Picked 8 wikis to focus on and got access to them all in Google Search Console

This week, I:

  • Analyzed Comscore and unique device data
    • Lots of contradictory signals, plus noise due to the ongoing traffic data backfill, which should finish by Tue, Oct 7.
  • Analyzed Google-reported clickthrough and Google-referred pageview data

Since my last update, I:

  • Did a ton of analysis and data visualization in preparation for public communications about recent declines in pageviews
  • Dug into trends in referrers
  • Tested and found support for the the hypothesis that iOS traffic declined less than Android traffic (suggesting that pageviews from people with high socio-economic status declined less)

Since my last update, I:

  • Produced lots more visuals and analysis
  • I'm on track to have this analysis and visualization work largely completed by the end of the day Tue, 28 Oct (since I'll be on vacation Wed-Fri)

Last week, I:

  • Wrapped up small wiki investigation and chose proposed example wikis
  • Started work on slides for the board meeting presentation

Last week, I:

  • Investigated the correlation between referral traffic from Google and from other external referrers
  • Finished slides for the board meeting presentation

Barring any last minute requests for changes to the slides, this work is done and the hypothesis will be closed shortly.