Page MenuHomePhabricator

Growth: how many users sign up with email addresses?
Closed, ResolvedPublic

Description

Several of our Growth team ideas, such as "Email new editors their impact" and "Email notification replies", rely on having email addresses for editors. We don't currently know how many editors have email addresses associated with their accounts. It is a high priority to find things out on this front by 2018-09-21, because it will materially affect our roadmapping.

The task is to find answers to these questions:

  • Over time, what percentage of new account signups give an email address? (TOP PRIORITY)
  • How common is it for editors to add an email address to their account at some point after they sign up?
  • What is the bounce rate for the email addresses we have for editors who have signed up in the last year? In other words, how reliable are those email addresses?

We definitely need this for Czech and Korean Wikipedias, but if it's easy to produce, it would be good to have them for several other Wikipedias for comparison, like English, German, Arabic, Ukrainian.

Event Timeline

It looks like Q2 above will require EventLogging if we want data since email addresses are stored in the user table, which is edited directly when they set/change the email address.

What is readily available in the user table is registration date, whether an account currently has an email address set, whether the email has been verified, and a rough estimate of their total number of edits. I used that to create a set of four plots, on a monthly basis for each of cswiki and kowiki, for accounts that are not autocreated.

  1. The proportion of registered accounts that have an email address set.
  2. The proportion that have verified their email address (meaning at some point it did not bounce).
  3. The proportion of registered accounts that have an email address set, but never made an edit.
  4. The proportion that have verified their email address but never made an edit.

I thought the accounts that have a verified email address but never made an edit are interesting, because you'd think that someone who verifies their address is also likely to edit. Turns out that about half of them don't.

Let me know if those plots are in the ballpark of what we're looking for, and I can get them done for enwiki, dewiki, arwiki, and ukwiki as well.

I'm not sure how to answer Q3, do we want to get ops in on this to see if addresses registered in the last year were emailed but bounced? Not sure if there are other ways to figure that one out.

Korean Wikipedia:

kowiki_email_proportions.png (781×1 px, 116 KB)

Czech Wikipedia:

cswiki_email_proportions.png (781×1 px, 116 KB)

Per discussion in Product Analytics today, adding the SQL query used to generate the data so that history is preserved.

SELECT SUBSTRING(user_registration, 1, 6) AS reg_month,
  COUNT(*) AS num_registrations,
  SUM(IF(user_editcount > 0, 1, 0)) AS num_edited,
  SUM(IF(user_email != '', 1, 0)) AS num_with_email,
  SUM(IF(user_email_authenticated IS NOT NULL, 1, 0)) AS num_auth_email,
  SUM(IF(user_email != '' AND user_editcount = 0, 1, 0)) AS num_email_noedits,
  SUM(IF(user_email_authenticated IS NOT NULL AND user_editcount = 0, 1, 0)) AS num_auth_noedits
FROM user u
LEFT JOIN (SELECT log_user
           FROM logging
	   WHERE log_type = 'newusers'
	   AND log_action = 'autocreate') AS a
ON u.user_id=a.log_user
WHERE user_registration IS NOT NULL
AND log_user IS NULL
GROUP BY reg_month;

Thanks, @nettrom_WMF. This is helpful. I have a couple questions, and then yes, I think we should do these for the other four wikis listed:

  • Is the denominator the same in all four plots? Is the denominator for each "of all those who created an account"? If so, I think it would be useful to see the bottom two plots (about editing) with the denominator being "of all those with email (or verified email)".
  • Which one is Czech and which is Korean? Could you add that label somewhere in the images?

I suppose since we don't know when someone added their email, it's hard to tell whether the rate of registering with an email is decreasing, or whether it's always been the same, but the longer you've had your account the more time you've had to add and verify your email.

I will say that it is disappointing how low the share of "verified email" people is. Maybe we'll have to think about encouraging people to add their emails, or we might have to improve the verification flow, which could probably stand to be improved.

I edited the original comment to make it clear which Wikipedia a plot belonged to. The denominator in all plots was the same: the number of registered accounts (for a given month).

I have created new plots for all six Wikipedias and uploaded them to Commons. In these new plots, the denominator changes depending on what we're looking at. For the plot of the proportion of accounts with an email address, the denominator is number of registered accounts. For the plots of accounts with a verified email address (top right) and accounts with no edits (bottom left), the denominator is number of accounts with an email address. For the plot of accounts with a verified email address and no edits (bottom right), the denominator is number of accounts with a verified email address.

Links to all the plots:

One question that has come up during this work, and that I hope @Catrope can answer is: does verifying one's email address change anything in the business logic in Wikipedia/MediaWiki?

@nettrom_WMF -- it looks like the bottom two graphs are identical across all six images. Could you please fix this?

@MMiller_WMF : You are right, thanks for spotting that! Have uploaded new and improved graphs for all wikis to Commons.

One question that has come up during this work, and that I hope @Catrope can answer is: does verifying one's email address change anything in the business logic in Wikipedia/MediaWiki?

Slightly but not significantly. Here's what I found. If you do *not* have a confirmed email address, your experience is different in the following ways:

  • You can't choose to receive emails (obviously), so you can't opt into watchlist emails or Echo emails
  • You can't receive emails through Special:Emailuser; other users will be told that if they try to email you using Special:Emailuser, and won't see an "email user" link when viewing your user page
  • You can't send emails through Special:Emailuser (because the recipient wouldn't be able to respond); if you try to use it, you'll get an error, and you won't see an "email user" link when viewing other users' user pages
  • You will have to do CAPTCHAs for some actions; users with confirmed email addresses appear to be exempt from some (all?) CATPCHAs

I'm not sure how to answer Q3, do we want to get ops in on this to see if addresses registered in the last year were emailed but bounced? Not sure if there are other ways to figure that one out.

There's an obscure table in the wikishared database called bounce_records, that might have what you need.

I looked into the bounce_records table that @Catrope mentioned. It appears to only contain two months of data. Not sure how useful that is to us, but it does appear to be on a format that would allow us to cross-reference it against the user table for the six wikis we're interested in.

@nettrom_WMF -- thank you for doing this analysis. It looks like the main takeaway is that email signup and verification rates are lower than we would want -- down to a minority of registered users having verified addresses. This is surprising and we may want to do more research on this in the future, and we should keep an eye on it as we go through the "Understanding first day" and "Personalized first day" projects.

This analysis immediately helped us plan our strategy for the next year (by avoiding the "Engagement emails" project for now), and it is helping inform our other work in "Personalized first day", by suggesting that we ask a second time for a user's email address to try to get those numbers up, so we have more users to work with when we want to communicate via email in the future.