Page MenuHomePhabricator

Implementation and dissemination of the reader research direction
Closed, ResolvedPublic

Description

In T400030: Draft first version of research direction on readers, I wrote up a draft for a new direction for research on readers.
As a next step, we want to figure out how to implement and disseminate the direction. This will include the following topics:

Implementation:

  • prioritization of RQs
  • defining projects incl. timelines and staffing
  • ...

Dissemination

  • Approval by Head of research
  • Coordination with Comms
  • etc.

Event Timeline

weekly update:

  • set up a meeting with @Miriam and @DKumar-WMF to have a first round of discussion on prioritization

weekly update

  • Had a meeting with Miriam and Debra, we reached consensus on a shortlist of potential research projects on readers
  • We identified one top contender around understanding factors that affect retention of readers. I will try to sketch an outline for how this project could look like.
  • Set up follow-up meeting next week to identify/discuss one other focus area

weekly update:

  • Started to focus in more detail on one project around understand better retention.
  • One potential approach would be to combine a reader survey via quicksurvey with measurement of retention in test kitchen. First feedback I got was that this is likely possible but my goal next week is to spend some time to figure out in more detail what are the technical options/limitations.

weekly update:

  • from discussions in #talk-to-experiment-platform (thread), my understanding is that, in principle, it is possible to combine quicksurveys with test-kitchen. However, it seems that some engineering work still needs to be done before this is ready. Thus, we might not be able to start working on this immediately.
  • I started exploring other opportunities. One potential direction is to understand in more detail the traffic from external search; i.e. what are the factors that lead to more/less traffic from external search engines and potentially identifying quasi-causal relationships via natural experiments.

weekly updates:

  • still figuring out feasibility of work to understand better retention of users. so far, we are considering two threads
    • retention survey: this would require some engineering work on T417185: Migrate QuickSurveys data collection to Test Kitchen. there are ongoing discussions around prioritization of this work
    • retention action: this work would try to identify which reader actions are correlated with high or low reader retention. This question might be possible to address with the data/analysis from the attribution research.
  • I started to look in more detail into traffic from external search engines over time. Specifically, I stratified by article topic (aggregating timeseries of all articles in English Wikipedia belonging to the same topic). While very exploratory, there are some interesting aspects: for some topics (such as biographies or films) traffic from external search engines has been stable or even increasing; while for other topics (most notably STEM) the traffic from external search engines has been decreasing. https://gitlab.wikimedia.org/mgerlach/external-traffic/-/blob/main/trend-external-traffic_timeseries-topics.ipynb?ref_type=heads

weekly update:

  • met and discussed with Debra: the focus should be on identifying causal relationships that affect reader metrics. We think that the retention survey will be restricted to correlational insights. In addition, some of the ongoing work from others (attribution research and comparative reader research) will already provide some insights into retention. We would like to first assess how much these insights could serve questions around retention from product teams before investing more.
  • We also discussed an alternative: The hypothesis is that additional content being available to internet users will lead to additional traffic from external search engines. Content translations could serve as individual natural experiments where some group of internet users are able to access new content (e.g. when an article is translated into ptwiki it is now accessible to speakers of Portuguese) whereas some group of internet users is not able to access the new content (e.g. when an article is translated into ptwiki it does not make a difference for non-Portuguese speakers). These two groups can be considered treatment and control groups and we can do a diff-in-diff comparison of pageviews with search-engine referer between the two groups. I will sketch an outline of this analysis next week.

weekly update:

  • went back to the original draft of the reader direction and am starting to revise it heavily based on the feedback I received. No major new items were brought up so far. Thus, I plan to re-organize and shorten the content into at most 5 major research directions. I will then try to get a new round of feedback.
  • I wrote up different options for a research project on better understanding retention of readers (googledoc). I shared this with Sherry and Hsuanwei to get feedback on what would be most useful for Product.
  • I wrote up a research plan for running an analysis using content translation as a natural experiment (googledoc). Shared this with Debra as a potential project for the reader growth bucket. waiting for feedback on prioritization.

weekly update;

  • reached out/discussed with Rita, Maryana, and Suman about relevent research questions on readers. Started revising the reader direction incorporating the feedback.
  • got feedback about the project on understanding reader retention. This is a question that it relevant to folks in the readers teams. My recommendation would be to start by looking at existing data being collected in ongoing projects (attribution and comparative reader research) to get some initial insights. Based on the feedback, we can set up a dedicated experiment in test kitchen in the next FY, if needed.

Final update:

  • heavily revised the reader research direction based on feedback (latest version here)
    • mainly re-organized existing content to make it shorter/more compact and align better with objectives for the next FY
    • shared the latest version with Leila; will pick this up in Q4 based on feedback I receive and cover the work in a separate task
  • had more detailed discussion about two proposed projects for Q4 on understanding reader retention and a natural experiment for estimating the causal effect: new content leads to additional pageviews
    • my current understanding is that I will start working on those in Q4.
    • the work on these projects will be covered in separate tasks