Page MenuHomePhabricator

Analysis of Article as a Living Doc iOS variant test run 1
Closed, ResolvedPublic

Description

Lets look at the data we've collected so far for AaLD. I'm open to any results or insights but some specific questions to see if we can answer yet:

  • Were there any meaningful differences in trust scores between the experimental and control group? Between users who saw the preview card and the full timeline?
  • Were there any meaningful differences in trust scores between the "baseline" users from the earlier fall survey and the experiment groups?
  • What parts of the feature were clicked through on or interacted with the most?
  • How far down did people scroll in the timeline?
  • How many thanks were generated?
  • Any data indication of bugs or major technical issues (ie. missing data, an event happening way less than expected, etc)?
  • Did the type of article (evergreen vs controversial) impact user interaction or survey results?
  • Was there any change over time in the scores (ie over the experiment reported trust went down)?
  • Which articles generated the most views? Interactions?

Event Timeline

SNowick_WMF moved this task from Triage to Kanban on the Product-Analytics board.
SNowick_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Not sure if we have the data for this, but if w do I'd be interested: Does the type of events shown in the card affect whether users open to the full timeline?

Were there any meaningful differences in trust scores between the experimental and control group? Between users who saw the preview card and the full timeline?

Control users (Not 'In Experiment') gave a slightly higher percentage of positive (Very/1) Trustworthy and Reliability scores.

'Full Timeline' viewers gave similarly close positive (Very/1) Trustworthy and Reliability scores as Non Full Timeline viewers. The most noticeable difference was the percentage of Full Timeline viewers who gave 'Not at all/5' ratings was higher for both Trustworthy and Reliability.

12345
Trustworthy - Control47.11%18.28%8.17%12.00%14.44%
Trustworthy - Cond=TRUE/Full=FALSE41.55%22.79%8.31%13.94%13.40%
Trustworthy - Cond=TRUE/Full = TRUE45.08%11.92%7.77%12.95%22.28%
Reliable - Control49.48%16.09%8.10%10.54%15.78%
Reliable - Cond=TRUE/Full=FALSE45.58%18.36%8.58%10.32%17.16%
Reliable - Cond=TRUE/Full = TRUE46.63%15.03%3.63%13.47%21.24%

Were there any meaningful differences in trust scores between the "baseline" users from the earlier fall survey and the experiment groups?

The Trustworthy 'Very/1' score for the prior survey respondents is noticeably lower at 32% than the all of Very/1 scores for the current survey averaging ~45%, however the 2 score for the prior survey is higher (average 33% prior vs 16% current) for both Reliability and Trustworthy. The percentage of negative 'Not at all/5' scores for the prior survey were also much lower than current (average 6.8% vs average 16.8%