Page MenuHomePhabricator

Welcome emails: export opted-in users
Closed, ResolvedPublic

Description

After newcomers opt-in to receiving emails while creating their accounts (T303240), the Growth team will export lists of those newcomers regularly to the Communications department so that emails can be sent to them via Mailchimp. These are the specifications:

  • Each export should include all users who...
    • ...created accounts on Spanish Wikipedia since the start date of the campaign AND
    • ...checked the box to opt-in to emails on the welcome survey
  • We will export the list twice per week, on Mondays and Thursdays. The export contains users for the given period of time, e.g. from May 26 to May 30.
  • The exports should be CSVs with these columns:
    • Registration date
    • Username
    • Email address

Directions from the security team:

  • data should be shared in the format of CSV files and uploaded to the private Google Drive folder
  • we need to dispose of the CSV files the moment we no longer need the data

Related Objects

Event Timeline

On T303240: Welcome emails: opt-in checkbox, I asked @EdErhart-WMF these questions:

  • Which users will you want to receive your emails? All new accounts? Only those that have edits?
  • Should we export all users with email addresses? Or will they need to have confirmed email addresses, meaning they received the confirmation email and clicked the link?
  • When we export lists for MailChimp, what format will you need? What fields will you need?

He responded:

  • We'd like to send emails to all new accounts that have addresses regardless of edits, and we are writing our copy in a way to account for both kinds of users.
  • Great question. I would prefer to send an email to all of them regardless of confirmation; we can accept that some of those addresses will be invalid. However, I'm open to changing course if this is a bad idea for reasons I haven't seen!
  • We need the email addresses in a CSV file organized into a single column. MailChimp has some info on this as well.

And I have this follow-up question:

For the format of the emails, you specified that you would need only a column of email addresses. Is there no other info you need for those records? Ideas include:

  • Registration date: if you want to send the email a specific amount of time after registration.
  • Username: if you are going to write, "Hello [username]!"
  • User ID: if you want to keep track in some kind of database.

Thanks for these ideas, Marshall! A username field is something we would appreciate having but isn't a need if it takes a non-trivial amount of time to build. Our current plan calls for a graphical intro banner in lieu of a personal hello, and it would be great to have the flexibility to make changes if the emails are less successful than we'd like.

Registration dates would be useful if we expand this pilot, at which time we'd ideally find an automated solution for sending the emails, but for this experiment we'll be sending them in batches whenever you export the addresses. We don't need user IDs as we'll be removing all personal data from MailChimp as soon as possible.

Thanks, @EdErhart-WMF. One more question: how often would you want to be receiving these exports? What is the most frequent that you might send these emails, and what is the least frequently you would want to send them?

Given that we'd like to welcome the new editors, the emails need to get to them within a reasonable timeframe. We would prefer to send them twice a week, but at a minimum we'd really need at least once per week.

Given that we'd like to welcome the new editors, the emails need to get to them within a reasonable timeframe. We would prefer to send them twice a week, but at a minimum we'd really need at least once per week.

Could you also let us know how long this exporting will be going on for? A couple of weeks or several months?

@kostajh and @MMiller_WMF , I am posting here a comment from @EdErhart-WMF regarding the 'Confirmed Emails' requirement:

We have confirmed with Legal that we can export all users with email addresses, not just those that have been confirmed. I'm sorry I didn't pass this along sooner.

@kostajh , I removed the confirmed emails requirement from the task description. Will the script need to be modified to account for that?

@kostajh and @MMiller_WMF , I am posting here a comment from @EdErhart-WMF regarding the 'Confirmed Emails' requirement:

We have confirmed with Legal that we can export all users with email addresses, not just those that have been confirmed. I'm sorry I didn't pass this along sooner.

@kostajh , I removed the confirmed emails requirement from the task description. Will the script need to be modified to account for that?

The output from the script includes a column with a 1 or a 0 depending on whether the user's email is confirmed. So no modification needed for the script.

@MMiller_WMF I believe you commented somewhere about having a control group for the welcome emails. You suggested that 20% of the users who checked the box are not exported by the script. Do you still want that to be done? If so we should make a task for it. Alternatively, should we just hide the mailing list field from 20% of eswiki visitors to Special:WelcomeSurvey?

@kostajh -- yes, @EdErhart-WMF and I decided we want to do this. I made the task here: T305015: Welcome emails: reserve control group

As you can see in there, I specified that it needs to be a subset of people who opted-in to receive the email, so that the treatment and control groups are statistically the same. That's why we can't just remove the opt-in checkbox for some users from the Welcome Survey -- opting-in to receive the email may be predictive of activation or retention. Thank you!

@EdErhart-WMF @MMiller_WMF I wanted to clarify something from the task description:

Because we'll be exporting everyone each time, the exports will be cumulative.

We are using timestamps to keep track of the start/end period for each export. So today's export is inclusive up to 20220512093322 (see T307451#7923806). The next export would start with 20220512093322 and go to the current timestamp for Monday May 16. That should prevent you from needing to filter out duplicate emails from each CSV. Does that sound OK?

Hey @kostajh -- if I understand correctly, you're saying that you'll send all of the collected emails on Monday, but that I can easily delete the ones that have already received an email by sorting via timestamps? If that's correct, that works great! (More generally, as long as I can determine who hasn't received an email yet, it will work great.)

Hey @kostajh -- if I understand correctly, you're saying that you'll send all of the collected emails on Monday, but that I can easily delete the ones that have already received an email by sorting via timestamps? If that's correct, that works great! (More generally, as long as I can determine who hasn't received an email yet, it will work great.)

Hi @EdErhart-WMF,

Sorry, not exactly:

  1. On Thursday I uploaded a first export (T307451#7923806).
  2. Today we'll upload another export (T307452: May 16 – Export and upload welcome survey data), and that export does not contain the same emails from the May 12 export

Thanks!

Even better! Thanks, Kosta. I appreciate the work here!

There are a couple export rows with an empty email address. (Oversighted users, maybe?) I assume the mailer tool will filter them out.

@Tgr it did! Thanks for the diligence here.

Change 797577 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] ExportWelcomeSurveyMailingListData: Use stderr for debug output

https://gerrit.wikimedia.org/r/797577

Change 797577 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] ExportWelcomeSurveyMailingListData: Use stderr for debug output

https://gerrit.wikimedia.org/r/797577

@kostajh , should we close this ticket? It looks like we have a working mechanism in place for exports and we have individual tickets for future exports.

kostajh claimed this task.

@kostajh , should we close this ticket? It looks like we have a working mechanism in place for exports and we have individual tickets for future exports.

sounds good to me!

Sgs changed the status of subtask T312168: July 5 – Export and upload welcome survey data from In Progress to Open.