Page MenuHomePhabricator

[SPIKE] Review approach to improving mobile talk pages
Closed, ResolvedPublic

Description

Introducing the Reply and New Discussion Tools on mobile talk pages (T282638) sits within a larger plan to bring the suite of DiscussionTools features [i] to mobile.

This task represents the work of reviewing that plan and proposing potential changes to it.

Open questions

  • 1. What – if any – any meta concerns or questions does the plan bring to mind?
  • 2. What analyses do you think we ought to run to decide the following?
    • Question: Can we be confident the mobile New Discussion and Reply Tools are having a net positive impact to the partner wikis where they are available and the people at these wikis who are using them? Decision to be made: Do any underlying issue need to be addressed before the team shifts its focus to introducing Usability Improvements on mobile?
    • Question: Can we be confident the suite of DiscussionTool features are having a net positive impact to the partner wikis where they are available and the people at these wikis who are using them? Decision to be made: can we move forward with offering all mobile Discussion Tool features at all wikis to everyone by default?

Plan for improving mobile talk pages

See: Editing Scratch/Offer Mobile DiscussionTools at Initial Wikis.

Done

  • @MNeisler: Document answers all ===Open questions above
  • @ppelberg: File "shell" tickets for the analyses we will plan to complete in order to evaluate the impact of the changes we are making to mobile talk pages

i. Reply Tool, New Discussion Tool, Topic Subscriptions, Usability Improvements

Event Timeline

MNeisler moved this task from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.
MNeisler added a subscriber: MNeisler.

@ppelberg - I've completed my review of the approach. I've documented answers to the Open Questions below. Let me know if you have any follow-up questions or need any additional detail to create the subsequent analysis tickets.

  1. What – if any – any meta concerns or questions does the plan bring to mind?

No significant meta concerns. Overall, the incremental approach described in the plan makes sense to me. Just a couple of clarifying questions that were not clear to me after reading the plan:

  • What's the rationale for deploying the Reply Tool and New Discussion Tool together vs separately at the partner wikis? I'm primarily asking just to understand the context as they were deployed separately on the desktop.
  • What's the deployment plan for Topic Subscriptions (Manual + Automatic)? Will these be deployed and evaluated at the same time as Reply Tool and New Discussion Tool?
  1. What analyses do you think we ought to run to decide the following?

Question: Can we be confident the mobile New Discussion and Reply Tools are having a net positive impact to the partner wikis where they are available and the people at these wikis who are using them? Decision to be made: Do any underlying issue need to be addressed before the team shifts its focus to introducing Usability Improvements on mobile?

Proposed Analysis: Pre and post deployment analysis of the Reply Tool and New Discussion Tool on partner wikis. I will plan to compare changes in identified metrics from two weeks prior and two weeks following the deployment.
Rationale: This proposed analysis is based on the wording provided in the plan, which indicates that the purpose of this analysis is to
"Evaluate whether the introduction of the Reply Tool and New Discussion Tool did NOT contribute to regressions in peoples' likelihood of using mobile talk pages successfully or disrupting others' experiences"

Considerations:

  • I'm comfortable with a pre and post-deployment analysis to identify any regressions from this change on a limited set of wikis. If the goal is instead to identify a specific net positive impact (e.g. 2% increase in edit completion rate ), then it will be difficult to obtain statistically significant data from a pre and post-deployment analysis as it's susceptible to external impacts (seasonality, other changes deployed during that time).
  • This change is introducing several different changes at once (disabling the Overlay, introducing mobile Discussion tools, and changing the default mobile talk experience) and it may be difficult to attribute the regression to a specific component of the change with pre and post quantitative data alone. Since we are not deploying incrementally, recommend using qualitative feedback to help identify possible sources of the issue along with any additional insights we can obtain from the quantiative data.

Question: Can we be confident the suite of DiscussionTool features are having a net positive impact to the partner wikis where they are available and the people at these wikis who are using them? Decision to be made: can we move forward with offering all mobile Discussion Tool features at all wikis to everyone by default?

The plan proposes two separate analyses to answer this question. I've provided comments on each below:
(1) "We'll then evaluate – by way of a second impact analysis – whether the introduction of the Usability Improvements contributed to regressions in peoples' likelihood of using mobile talk pages successfully or disrupting others' experiences."

Proposed Analysis: Pre and post-deployment analysis of the usability improvement features on partner wikis. I will plan to compare changes in identified metrics from two weeks prior and two weeks following the deployment.
Rationale In line with my comment above, the pre and post-analysis may not be able to confirm any small positive impacts from the deployment of this feature but should inform us if the introduction of usability improvements contributed to any regressions. Recommend this is done at least a month following the prior analysis so it's easier to distinguish impacts from the deployment of Reply Tool/New Discussion Tool and the deployment of the Usability Improvements.

(3) "The result of this impact analysis will help us decide whether the suite of mobile talk page improvements we'll have made are ready to be evaluated through an A/B test. In this A/B test, we'll deploy the mobile talk page improvements to a set of wikis that have not had access to any mobile talk page improvements to-date"
Proposed Analysis: AB Test Requirements: Only on wikis that have not had access to any mobile talk page improvements to date. We will also need to consider if we will include or exclude people who have used DiscussionTools on desktop. For reference, we decided to limit the mobile VE as default AB test to people who have not edited on any platform before. We will need to consider the implications on sample size as I imagine there are fewer editors that edit mobile talk pages. The analysis in T295180 should help inform this question once complete.

Rationale: Since the results of the analysis will be used to inform the deployment of new features as an opt-out across all wikis, it's more important that we run a test that provides statistically significant data to confidently determine the impact. Note that since the AB test is being conducted on a suite of features (not just the Reply Tool or Discussion Tool), we will not have data on how each of the different features are interacting with each other. For example, if we see an increase in the number of talk page replies, we cannot say if that is due more to an increase in awareness from the Usability Improvements or from the Reply Tool. However, I don't believe this is a concern in this case, because we are evaluating the features together as a set that will either will be deployed together (not individually)

@ppelberg - I've completed my review of the approach. I've documented answers to the Open Questions below. Let me know if you have any follow-up questions or need any additional detail to create the subsequent analysis tickets.

This is wonderful, @MNeisler. Comments/responses in-line below from the conversation we had today...

  1. What – if any – any meta concerns or questions does the plan bring to mind?

No significant meta concerns. Overall, the incremental approach described in the plan makes sense to me. Just a couple of clarifying questions that were not clear to me after reading the plan:

  • What's the rationale for deploying the Reply Tool and New Discussion Tool together vs separately at the partner wikis? I'm primarily asking just to understand the context as they were deployed separately on the desktop.

The proposal to deploy the Reply and New Discussion Tools simultaneously is driven largely by:

  • Both tools being technically ready to be made available at the same time
  • Us being somewhat confident people are able to use the two tools successfully as evidenced by: T252057, T243249, T246190, and T246191.
  • The functionality the two tools offer are complimentary

Note: when we were ready to deploy the Reply Tool to an initial set of wikis on desktop, the New Discussion Tool had not yet been implemented or its usability tested with Junior and Senior Contributors.

  • What's the deployment plan for Topic Subscriptions (Manual + Automatic)? Will these be deployed and evaluated at the same time as Reply Tool and New Discussion Tool?

The plan is to deploy mobile Topic Subscriptions (Manual + Topic) after the mobile Reply and New Discussion Tools are deployed and alongside mobile Usability Improvements. More info in T298055.

Note: we're proposing to couple the deployment of Topic Subscriptions and Usability Improvements on mobile because introducing Topic Subscriptions on mobile depends on us first defining what the icon for subscribing and unsubscribing will look like (the text-based [subscribe ] affordance doesn't behave too well on mobile (T292241)) , which we plan to do alongside the design of Topic Containers (T269950)

  1. What analyses do you think we ought to run to decide the following?

Question: Can we be confident the mobile New Discussion and Reply Tools are having a net positive impact to the partner wikis where they are available and the people at these wikis who are using them? Decision to be made: Do any underlying issue need to be addressed before the team shifts its focus to introducing Usability Improvements on mobile?

Proposed Analysis: Pre and post deployment analysis of the Reply Tool and New Discussion Tool on partner wikis. I will plan to compare changes in identified metrics from two weeks prior and two weeks following the deployment.

Sounds good. Here is a ticket where we can work to define the metrics you referenced above: T298058.

Note: I've also added an === Analysis Scope section within the task description to document the "...two weeks prior and two weeks following..." requirement you mentioned above. See: T298058#7586811.

Rationale: This proposed analysis is based on the wording provided in the plan, which indicates that the purpose of this analysis is to
"Evaluate whether the introduction of the Reply Tool and New Discussion Tool did NOT contribute to regressions in peoples' likelihood of using mobile talk pages successfully or disrupting others' experiences"

Considerations:

  • I'm comfortable with a pre and post-deployment analysis to identify any regressions from this change on a limited set of wikis. If the goal is instead to identify a specific net positive impact (e.g. 2% increase in edit completion rate ), then it will be difficult to obtain statistically significant data from a pre and post-deployment analysis as it's susceptible to external impacts (seasonality, other changes deployed during that time).
  • This change is introducing several different changes at once (disabling the Overlay, introducing mobile Discussion tools, and changing the default mobile talk experience) and it may be difficult to attribute the regression to a specific component of the change with pre and post quantitative data alone. Since we are not deploying incrementally, recommend using qualitative feedback to help identify possible sources of the issue along with any additional insights we can obtain from the quantiative data.

Today, @MNeisler and I confirmed the objective of the impact analysis of the mobile Reply and New Discussion Tools (T298058) is as Megan described above, "...to identify any regressions from this change on a limited set of wikis."

As such, we are comfortable moving forward with T298058 knowing doing so will preclude us from identifying causal links between the individual changes we're introducing (removing the MobileFrontend talk page overlay, introducing the Reply Tool, and introducing the New Discussion Tool) and the resulting impact we see on the we have not yet defined.

Question: Can we be confident the suite of DiscussionTool features are having a net positive impact to the partner wikis where they are available and the people at these wikis who are using them? Decision to be made: can we move forward with offering all mobile Discussion Tool features at all wikis to everyone by default?

The plan proposes two separate analyses to answer this question. I've provided comments on each below:
(1) "We'll then evaluate – by way of a second impact analysis – whether the introduction of the Usability Improvements contributed to regressions in peoples' likelihood of using mobile talk pages successfully or disrupting others' experiences."

Proposed Analysis: Pre and post-deployment analysis of the usability improvement features on partner wikis. I will plan to compare changes in identified metrics from two weeks prior and two weeks following the deployment.

Sounds good.

Rationale In line with my comment above, the pre and post-analysis may not be able to confirm any small positive impacts from the deployment of this feature but should inform us if the introduction of usability improvements contributed to any regressions. Recommend this is done at least a month following the prior analysis so it's easier to distinguish impacts from the deployment of Reply Tool/New Discussion Tool and the deployment of the Usability Improvements.

Noted. I've added an === Analysis Scope section within T298065 to document the analysis timing requirements you mentioned above. See: T298065#7586841.

(3) "The result of this impact analysis will help us decide whether the suite of mobile talk page improvements we'll have made are ready to be evaluated through an A/B test. In this A/B test, we'll deploy the mobile talk page improvements to a set of wikis that have not had access to any mobile talk page improvements to-date"
Proposed Analysis: AB Test Requirements: Only on wikis that have not had access to any mobile talk page improvements to date. We will also need to consider if we will include or exclude people who have used DiscussionTools on desktop. For reference, we decided to limit the mobile VE as default AB test to people who have not edited on any platform before. We will need to consider the implications on sample size as I imagine there are fewer editors that edit mobile talk pages. The analysis in T295180 should help inform this question once complete.

Good spot. I've added this question about whether the A/B test we'll run in T298062 should be limited to wikis that have NOT had access to mobile DiscussionTools features. See T298062#7586868.

Rationale: Since the results of the analysis will be used to inform the deployment of new features as an opt-out across all wikis, it's more important that we run a test that provides statistically significant data to confidently determine the impact. Note that since the AB test is being conducted on a suite of features (not just the Reply Tool or Discussion Tool), we will not have data on how each of the different features are interacting with each other. For example, if we see an increase in the number of talk page replies, we cannot say if that is due more to an increase in awareness from the Usability Improvements or from the Reply Tool. However, I don't believe this is a concern in this case, because we are evaluating the features together as a set that will either will be deployed together (not individually)

Today, @MNeisler and I confirmed we are NOT concerned with attributing changes in the metrics we will be tracking to a particular intervention/change because, as Megan mentioned above, "...we are evaluating the features together as a set that will either will be deployed together (not individually)."