Page MenuHomePhabricator

Civi: is it possible to correct obvious email address typos before deduping exact matches?
Closed, ResolvedPublic1 Estimated Story Points

Description

We've begun to see more Zendesk tickets that are bouncebacks from fundraising emails where the email address has an obvious typo. We fix these manually when possible, but it's labor intensive. Can we batch find-and-replace some of these obvious errors to the most common email domains, in order to make deduping exact matches even more effective?

For example in Civi there are:

309 donor email addresses end in @yaho.com
541 donor email addresses end in @gmial.com

If it's not technically a challenge, are there any reasons not to prune the really obvious typos? This would also have the benefits of increasing the size of the email list, and saving @CCogdill_WMF and the DS team some time.

Update: after chatting with @DStrine, here are the top 20 varmints:

%@gmai.com = 1916 records
%@gamil.com = 999
%@gmal.com = 554
%@gmial.com = 541
%@gmil.com = 435
%@gmail.co = 427
%@gmail.om = 344
%@yaho.com = 309
%@gmail.cm = 293
%@homail.com = 190 * domain actually goes somewhere, to Microsoft
%@hotmal.com = 160 * domain actually goes somewhere, to Microsoft
%@yahoo.co = 123
%@hotmil.com = 118
%@hotmail.co = 115
%@yhaoo.com = 109
%@yahoo.om = 107
%@yahoo.cm = 107
%@yhoo.com = 100
%@gmail.vom = 97
%@hotmail.cm and also %@yahooo.com = 89

For aol, @aol.cm = 35 @aol.om = 35 @aol.co = 48 and then there's a lower level of typos for lots of other domains.

Event Timeline

DStrine set the point value for this task to 1.May 17 2016, 8:26 PM
DStrine moved this task from Triage to Sprint +3 on the Fundraising-Backlog board.

We could also use this MIT-licensed JS lib to help reduce these errors on the way in:
https://github.com/mailcheck/mailcheck

Thanks for doing all this work, @MBeat33! I actually checked in with legal about this and confirmed it's okay to correct obvious domain typos on our end.

These numbers are a lot smaller than I expected, but it would be cool to do anyway!