Page MenuHomePhabricator

Welcome survey: investigate Vietnamese abandonment rate
Closed, DeclinedPublic


@nettrom_WMF's initial statistics show that Vietnamese Wikipedia is having exceptionally high abandonment rates on the welcome survey for only desktop accounts.

  • Starting on Jan 24, half of all new accounts have been receiving Variation A and half have been receiving Variation C of the welcome survey.
  • We have been watching the results to make sure that the survey is not causing newcomers to leave Wikipedia right away. The percentage of newcomers who leave Wikipedia right away is called the "abandonment rate".
  • We know from our experiments in Czech and Korean Wikipedias that having some abandonment is normal. Overall, abandonment in those wikis is about 10% when newcomers do not receive the survey -- in other words, 10% of newcomers leave the website immediately after creating their account, without doing anything else.
  • We also saw in those experiments that receiving the survey did not increase that rate -- about the same number of newcomers left the website when they received the survey.
  • But we're seeing something very different in Vietnamese Wikipedia. Among users who create their accounts on mobile, we see the same abandonment rates as in Czech and Korean: about 10%, for both Variation A and Variation C. But for users who create their accounts on desktop, we see very high abandonment rates: 46% for both Variation A and Variation C.
  • If the survey is causing that many newcomers to abandon, then we need to change something very quickly. But first we want to do two things:
    • Ask the community if they have any ideas about what could be going on.
    • Change the experiment to include a control group so we can see what the normal abandonment rate is on desktop without the survey. We may start this next week.
  • We have two hypotheses about what could be going on:
    • There could be a real cultural difference between users who create accounts on mobile vs. those who create accounts on desktop. Perhaps desktop users are much less interested in surveys or do not want to answer questions.
    • There could be something weird going on, in which many desktop accounts are not real accounts, perhaps created by vandals or bots. The reason we think this is possible is because of the clear difference between desktop and mobile.

Below is a table showing the differences in abandonment rate in Czech, Korean, and Vietnamese Wikipedias for Var A, Var C, and Control (with the exception of Vietnamese Wikipedia, which does not yet have a control group).

Welcome_survey_abandonment_2019-02-20.png (222×541 px, 37 KB)

Event Timeline

Assigning to @Trizek-WMF because the next step is to discuss this issue with the Vietnamese community to see if they have any insight.

Trizek-WMF triaged this task as Medium priority.Feb 21 2019, 4:53 PM

@Tuanminh01 reports some spambots that have created a lot of accounts on Vietnamese Wikipedia. Maybe they have been trapped by the survey. He mentioned a ticket but I can't find it.

No more conclusive replies so far. The spam-bot issue should be investigated a bit more by the implementation of the control group.

@Tuanminh01 reports some spambots that have created a lot of accounts on Vietnamese Wikipedia. Maybe they have been trapped by the survey. He mentioned a ticket but I can't find it.

It might be private. Remember a spam ticket from the past, unsure which project it was :(.

MMiller_WMF reassigned this task from Trizek-WMF to nettrom_WMF.

I am re-opening this task because although we already asked the community about why they think the abandonment rate might be behaving this way, and we've already run the Variation A experiment, we still want to figure out who those abandoning users are. This is important because we want to eventually turn the welcome survey back on for Vietnamese (it was turned off in T218920.

The previous results of analyzing the abandonment rate between the survey and control groups can be found in T206380#5045440. That analysis includes a split between desktop and mobile registrations, and we found that on the mobile side there was no significant difference in abandonment, while on the desktop side the difference was statistically significant. Because of this difference in effect I've also chosen to split further analysis in the same way.

My first approach is to investigate if there is a difference in activation rate (proportion of registrations who edited within 24 hours after registration) between the survey and control groups on the desktop side. The results were as follows:

GroupNot activated%Activated%

As we can see, there is a 0.8pp difference between the two. This difference is not statistically significant (X^2 = 0.35, df = 1, p = 0.55).

The same table for mobile registrations is as follows:

GroupNot activated%Activated%

The difference here is 1.0pp, and it is also not statistically significant (X^2 = 0.11, df = 1, p = 0.74).

I think this is one data point that suggests that the welcome survey is not significantly hindering legitimate users from registering. Another data point to consider is the number of registrations per month (see link below), which was very high in March. Our dataset for this analysis mainly consists of registrations in March, meaning it'll be affected by that. One of the open questions is what might be driving that strong increase in registrations.

Number of registrations per month on Vietnamese:|bar|2-Year|~total

We've so far not found any significant differences and at the moment other things take priority. Moving this to Q1 so that we can pick it up again then.

Any update concerning the investigation? We will probably turn the survey on again since we are going to resume our active work with Vietnamese Wikipedia.

BTW, I came back to this because of T252391, and noticed that when looking at the two-year registration rate on Vietnamese[1] it looks like the time period where we ran our Welcome Survey A/B test had substantially higher registration rates than expected. If we decide to run another experiment, we should consider fitting a time-series model to the data and use it to predict number of registrations in order to understand if registrations are outside what's expected.


nettrom_WMF claimed this task.
nettrom_WMF added a subscriber: KStoller-WMF.

I'm closing this task as declined, because we have several reasons for not prioritizing this work. First of all, as mentioned above in T216668#6533364, we see indications that our experiment ran during a time with high registration counts on viwiki and these might have been artificially inflated. Secondly, while we found differences in abandonment, we did not find differences in activation. Lastly, the Welcome survey has since been deployed to every Wikipedia without any suggestions that we should be concerned. @KStoller-WMF will follow up on determining if we'll deploy the survey to viwiki.