Page MenuHomePhabricator

Track email clickthroughs on donate wiki
Closed, ResolvedPublic2 Estimated Story Points

Description

We want to stop using Silverpop's click tracking and instead have donate wiki track email clickthroughs for us.

BACKGROUND: Silverpop/IBM isn't able to track clicks in emails under https, and WMF has made a commitment that the wikimedia.org domain will be https-only, so we had to change the link domain which tracks email clicks to links.wikimedia.mkt4477.com. It's not ideal, and donors are already starting to notice, so we need to find another solution. Ideally before our big US email campaign starts in November.

Silverpop tracks clicks by sending the recipient to links.wikimedia.mkt4477.com, which captures a few pieces of data and then quickly redirects the donor to https://donate.wikimedia.org as intended. @K4-713 has told me that donate wiki uses a very similar type of system for banner analytics already, so we could bypass the need to redirect altogether. She may have also intimated that donate wiki is a bit bored right now :)

We don't necessarily need the data to tie directly to a donor profile, however we need the data to track unique clicks. I see it as a good idea to tie each unique record to a Civi contact id so we can integrate this data with Civi one day.

So Silverpop links will contain three essential values:

  1. MAILING_ID
  2. CONTACT_ID (Civi field)
  3. Base URL (u) to Resolve
  4. (optional – default to root) Page URL (page) to Resolve
  5. (optional – default to https) Protocol for URL redirect (p=h/p=s), h would resolve to http, s would resolve to https (in case we are linking to a third party website without https)
  6. Any additional URL parameters should pass through to the resolving page

Things I need/need to know:

  • Can we track uniques?
  • Will this data be available in ecom related to each unique mailing ID?
  • Can we track clicks from individual links in the same email - i.e. so I can say link 1 got more clicks than link 2?
  • Can we find a way to track clicks to links that resolve outside of donate.wiki (i.e. blog links)? <<< this is a lower priority question, and does not need to be addressed immediately
  • Will this destabilize anything for December?
  • Is this achievable before December (estimate, please :)?

Conclusion

  • Can we track uniques?
    • Yes, Silverpop can add contact_id and respondent_id, which is unique per recipient x mailing.
  • Will this data be available in ecom related to each unique mailing ID?
    • You'll be able to do MySQL queries relating utm_source with letter opens.
  • Can we track clicks from individual links in the same email - i.e. so I can say link 1 got more clicks than link 2?
    • Yes, but you'll have to add an URL param to each link manually.
  • Can we find a way to track clicks to links that resolve outside of donate.wiki (i.e. blog links)? <<< this is a lower priority question, and does not need to be addressed immediately
    • Yes, this is already possible with just utm_source.
  • Will this destabilize anything for December?
    • Low risk.
  • Is this achievable before December (estimate, please :)?
    • Yes, I'm assigning the work 2 points.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@CCogdill_WMF how about contact ID, just because that's how I've written the query.

I'm distilling the web logs down into two tables - one with an entry for each hit on donate wiki that has a mailing id, and a de-duped one that only has an entry for the first user/mailing combination.

I'm using contact_id to determine unique users. I was using utm_campaign to determine unique mailings, but it looks like I should switch to utm_source.

The exported data will be from the de-duped table - counting how many users first appear each hour for each combination of utm_source, utm_campaign, and link_id (add this to distinguish different links in the same email).

Would counts from the non-unique table also be useful? How about roll-ups across all time for a utm_campaign or a utm_source value? Want to email me an example of the tracking info you get from Silverpop?

Okay, so the url parameter is &contact_id=%%CONTACTID%%. This won't be in*all* of today's emails - it took too much time to update all the links, but will be in half. I can give you the mailing IDs (utm_source) for the specific emails if you want them.

Sure, data for both gross and unique clicks would be great, although I almost never use gross clicks as a metric to determine effectiveness. What would be better is if I could run a query to get breakdowns of number of clicks per link. If we passed something in the url giving each link a name, would that be possible? That wouldn't be something I would necessarily need in a regular report, but would be nice to get when test content is applicable.

In Silverpop, I open individual mailing reports for each email. They look like this:

Screen Shot 2015-12-08 at 11.42.09 AM.png (695×867 px, 103 KB)

Then there is a separate clickthrough report which breaks down behavior on each link in the email:

Screen Shot 2015-12-08 at 11.43.45 AM.png (708×871 px, 131 KB)

Currently, I do not care about individual data, only aggregate. Do you want to see a copy of the actual data from Silverpop? The screenshots above are how I consume it.

Change 258058 had a related patch set uploaded (by Ejegg):
Oozie/Hive scripts to get email tracking data

https://gerrit.wikimedia.org/r/258058

Hi @CCogdill_WMF, here are some stats from the first links with contact_id till now. There seem to be a few hits with contact_id but not utm_campaign or utm_source. Not sure what's up with those.

The first three columns are utm_campaign, utm_source, link_id.
Then for by_hour CSVs the next 4 are year, month, day, hour.
The last column is always the total.

'Gross' are all clicks, and 'unique' counts each contact id once, the first time it appears. 'rollup' gives totals for each campaign and source over all time.



Next up: making up-to-the-hour versions of these available at a labs URL.

@Ejegg awesome, looks like the data is coming in! Is there a way for me to run this command myself so I can compare with Silverpop data real-time?

The only reports I would need access to on a regular basis would be the rollup reports. I can't see myself using the by hour stats.

@CCogdill_WMF almost got that worked out! The data is on an analytics machine with pretty restricted access. I'm making it export the aggregated counts to a CSV after crunching each hour's numbers, and will sync those CSVs to a labs instance where you can fetch them via a browser.

Oh man, looks like we've been pulling 100% of the hits from donatewiki onto the payments cluster forever. I should scrap the complicated stats/oozie job and just run something like the banner impressions loader.

That makes sense, @K4-713 suggested donate wiki was doing a lot of the work
for us already.

@awight, the plan is to add contact_id and link_id columns to landingimpression_raw and the existing process that populates it. I'll create another table donatewiki_unique with timestamp, utm_source, utm_campaign, contact_id, and link_id, with a unique key formed by utm_source and contact_id. I can then insert ignore into that table all donatewiki hits from _raw, and either export unique source/link counts to a CSV file or to another table.

Change 259178 had a related patch set uploaded (by Ejegg):
Add contact_id and link_id to raw LP stats

https://gerrit.wikimedia.org/r/259178

Change 259197 had a related patch set uploaded (by Ejegg):
WIP add donatewiki uniques table

https://gerrit.wikimedia.org/r/259197

Change 259178 merged by Ejegg:
Add contact_id and link_id to raw LP stats

https://gerrit.wikimedia.org/r/259178

Change 259340 had a related patch set uploaded (by Ejegg):
Revert "Add contact_id and link_id to raw LP stats"

https://gerrit.wikimedia.org/r/259340

Change 259404 had a related patch set uploaded (by Ejegg):
Create and populate donatewiki_unique

https://gerrit.wikimedia.org/r/259404

Change 259197 abandoned by Ejegg:
WIP add donatewiki uniques table

Reason:
Replaced by I2122cd0010f1f3a519b9bbb8632a8048356acfe2

https://gerrit.wikimedia.org/r/259197

Change 259424 had a related patch set uploaded (by Ejegg):
Keep donatewiki counts up to date

https://gerrit.wikimedia.org/r/259424

Change 259340 merged by Ejegg:
Revert "Add contact_id and link_id to raw LP stats"

https://gerrit.wikimedia.org/r/259340

Change 259404 merged by Ejegg:
Create and populate donatewiki_unique

https://gerrit.wikimedia.org/r/259404

Change 259424 merged by Ejegg:
Keep donatewiki counts up to date

https://gerrit.wikimedia.org/r/259424

Hi @CCogdill_WMF ,

We're now getting email link tracking data in the pgehres database. There are two new tables, donatewiki_unique and donatewiki_counts. donatewiki_unique gets a row for each new combination of utm_source and contact_id, i.e. the first time a donor comes in from any particular email. donatewiki_counts just tallies them by campaign, source, and link id.

To get counts broken out by utm_source, utm_campaign, and link_id (you can add this parameter to differentiate links):

select * from donatewiki_counts;

If you just want totals by utm_source:

select utm_source, sum(count) from donatewiki_counts group by utm_source;

Change 259622 had a related patch set uploaded (by Ejegg):
Fix key on donatewiki_unique

https://gerrit.wikimedia.org/r/259622

Change 259623 had a related patch set uploaded (by Ejegg):
Replace donatewiki counts table with view

https://gerrit.wikimedia.org/r/259623

Thanks, @Ejegg! Exciting to have a query to run :)

A couple of questions stemmed from the query below:

select * from donatewiki_counts where utm_source LIKE 'sp5027%' group by utm_source;

  1. We sent out 9 emails with a utm_source like 'sp5027%', but this query only returns results for utm_source = sp50272171. I realized that in some if not all of the other emails, I used "Contact_id=" instead of "contact_id=" in the URL. Does the case matter, and is that why my query only returns 1 row?
  2. I'm not getting a clickthrough count that matches Silverpop's data. Is there a lag? For utm_source sp50272171, the query above returns 1140 unique clicks (if I drop the group by I get 1287 uniques), but Silverpop shows 2086 unique clicks for this email.

Hi @CCogdill_WMF,

  1. Looks like the param is indeed case sensitive, so it's not counting Contact_id clicks.
  2. There shouldn't be much lag - half an hour or so at most. Did email sp50272171 go out after 22:00 UTC on 12/16? That table only has data on clicks from that point on. I could write something to backfill earlier data if it would help.

-Elliott

Ah, okay, that email went out at 21:00 UTC on 12/16, so it's possible those
extra clicks came in the 1st hour... We sent 5 CA emails today and I made
sure every link had the correct contact_id parameter, so I'll compare the
stats later today and see how it looks.

Hi @CCogdill_WMF,

Are the numbers looking closer to Silverpop's for more recent campaigns?

@Ejegg there was a rogue link with no contact_id in my last round of emails
so I haven't had a good comparison. Sorry about that! We have some 40 links
in the email due to a bunch of dynamic content rulesets, and one slipped
through the cracks.

I'll let you know on Monday when our next round of emails go out.

But to answer your question - yes, the numbers are looking closer! I just
want a true A/B test :)

Ahh, sorry the setup is more complicated than the Silverpop tracking. Let me know if there's anything we can do to make it simpler.

Not a problem, this is a temporary hassle for a long-term solution. Totally
worth the time.

Change 259622 merged by Awight:
Fix key on donatewiki_unique

https://gerrit.wikimedia.org/r/259622

Change 259623 merged by Awight:
Replace donatewiki counts table with view

https://gerrit.wikimedia.org/r/259623

@Ejegg we sent emails yesterday which all should have had correct links. I tried to compare data, but I keep getting an error when running the query below:

MariaDB [pgehres]> select * from donatewiki_counts where utm_source LIKE 'sp5034%' group by utm_source;
ERROR 1045 (28000): Access denied for user 'ccogdill'@'localhost' (using password: YES)

I can't run the other queries I was using last week on this DB either. Help?

@Jgreen, I replaced table pgehres.donatewiki_counts with a view yesterday, and now neither I nor @CCogdill_WMF can select from it on the replicated box. Could you please grant us permissions or let me know what I should do differently?

@CCogdill_WMF, looks like @Jgreen just fixed it for us. You should be able to select again.

Change 258058 abandoned by Ejegg:
Oozie/Hive scripts to get email tracking data

Reason:
Replaced by I2122cd0010f1f3a519b9bbb8632a8048356acfe2 and related patches.

https://gerrit.wikimedia.org/r/258058

Hi @Ejegg, thanks so much for your patience on this one. I'm still having
trouble getting the data. I was able to successfully pull data for our
emails on 12/18 with the following query:

select * from donatewiki_counts where utm_source LIKE 'sp5029%' group by
utm_source;

However, as I noted on this task, there was a rogue link in those emails
that didn't have the contact_id param. I cleaned up all the links in the
emails and waited for our next big send. We sent emails 12/29 - 12/31 with
utm_source LIKE:
sp5034
sp5035
sp5036

Yet when I swap in these sources in the query above, I get an empty set.
Here's the URL in all these emails - can you confirm this is correct? Did
click tracking get turned off?

donate.wikimedia.org/?appeal=Appeal-JimmyQuote&utm_campaign=C15_Email3&utm_medium=email&utm_source=sp%%MAILING_ID%%&hpc=%%highest_usd_amount%%&contact_ID=%%ContactID%%&uselang=%%IsoLang%%&link_id=%%LINK_NAME%%

It's still getting some new clicks, but numbers seem pretty low. Looks like 30-40 per day for the last 5 days, mostly with utm_source like 'sp502%'. Let me look at the raw logs to see if anything changed.

I'm sending another email on Friday, so we'll have an opportunity to get new data.

Ah, phooey, it's the case sensitivity again - contact_ID instead of contact_id. I'll change the code to work with either one.

Change 263679 had a related patch set uploaded (by Ejegg):
Make landing page parameters case-insensitive

https://gerrit.wikimedia.org/r/263679

Ugh @Ejegg I'm sorry. Silverpop conventions are totally different and we
keep confusing them.

Our email on Friday won't actually have a donation link in it, so we'll
have to wait for our Sweden pre-test email at the end of the month. We're
making our new control templates and will make sure all the links have
lowercase contact_id.

Change 263679 merged by Ejegg:
Make landing page parameters case-insensitive

https://gerrit.wikimedia.org/r/263679

Hi @CCogdill_WMF! I just deployed the change to make landing page parameters case-insensitive. From now on either contact_ID or contact_id will work just fine. Let me know if there's anything else we can do to make the in-house click counting nicer to use.

Thank you @Ejegg! We should have appeals going out next week so we will be
able to settle this once and for all.

hey @CCogdill_WMF how did those appeals go? can we close this one?

@Ejegg I'm not sure what I'm doing wrong, but I haven't seen any data come through for the emails we sent at the end of last week (utm source LIKE 'sp5063%' or utm_source LIKE 'sp5061%'). Here's a real URL I clicked on from a donor email in Zendesk -- you can see the contact_id in there. Can you tell what's not working?

https://donate.wikimedia.org/w/index.php?title=Special:FundraiserLandingPage&country=US&uselang=es&utm_medium=email&utm_source=sp50636778&utm_campaign=C15_Email1&contact_id=14832296&appeal=Appeal-JimmyQuote

All the params on that email look good. I'll check if we're tracking hits on '/w/index.php', or just on the bare domain. Also, the email clicks are coming in on the same import that gives us banner impressions, and I know that had a couple of recent outages. I'll see if the email data was affected and if it has been backfilled.

Ah good thinking on the outage, hopefully that was it! More emails will go
out toward the end of the week.

Another round of emails went out today with utm_source like "sp5066%". I'm
getting an empty set for that too.

Change 269820 had a related patch set uploaded (by Ejegg):
Fix query string key case change

https://gerrit.wikimedia.org/r/269820

Oops, this outage was all my fault. My patch to make the parameters case insensitive was totally broken, and the the job didn't send failmail after deployment so I didn't realize. Will deploy a fix ASAP

Change 269820 merged by Ejegg:
Fix query string key case change

https://gerrit.wikimedia.org/r/269820

@Ejegg I saw your ping on IRC. My internet is on the fritz at the moment
and I can't load it, apologies!

The url we're using still has the special:fundraiserlandingpage, we're just
using a new appeal. Does that mean it should still work tomorrow?

https://donate.wikimedia.org/w/index.php?title=Special:FundraiserLandingPage&country=US&uselang=en&utm_medium=email&utm_source=sp50619562&utm_campaign=C15_Email1&hpc=&appeal=Appeal-JimmyQuote&contact_id=&link_id=

Thank you!

Le mercredi 10 février 2016, Ejegg <no-reply@phabricator.wikimedia.org
<javascript:_e(%7B%7D,'cvml','no-reply@phabricator.wikimedia.org');>> a
écrit :

Ejegg moved this task to Done on the Fundraising Sprint Cat Herding
workboard.

TASK DETAIL

https://phabricator.wikimedia.org/T114010

WORKBOARD

https://phabricator.wikimedia.org/project/board/1752/

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ejegg
Cc: atgo, gerritbot, Jgreen, Ottomata, jrobell, Ejegg, K4-713, Nemo_bis,
awight, Pcoombe, CCogdill_WMF, Aklapper, DStrine, cwdent

OK, the fix is up and email clicks are coming into the db again. Special:FundraiserLandingPage is always tracked - the appeal doesn't matter. Any page title on donate starting with WMF on donate is also tracked as a landing page. Let me know if you want to add any more.

Awesome, that's good to know. I think this is good for now, but I will
check in tomorrow afternoon with stats from the day's emails.

There's an additional component to this which may deserve a new task (lmk).
We're now sending these "remind me later" emails from banner opt-ins, which
means we don't have contact ids for most of these people. Do you know if I
can alternately plug in Silverpop's recipient_id into that field, or does
the system care if it ties to a contact record?

To be clear, this is what the url param would be:
&contact_id=%%RECIPIENT_ID%%

@Ejegg success! Results from today's emails:

utm_sourceutm_campaigncountsilverpop_count
sp50673156C1516_Email211211130
sp50673163C1516_Email212081218
sp50673183C1516_Email211461162

`

I think I'm comfortable with that amount of variation! I'm going to keep comparing this data on emails for the next week, and then I think we'll make the official switch next Thursday if it continues to look this good. Thank you for making this happen!!

Reiterating my one outstanding question from yesterday:
There's an additional component to this which may deserve a new task (lmk). We're now sending these "remind me later" emails from banner opt-ins, which means we don't have contact ids for most of these people. Do you know if I can alternately plug in Silverpop's recipient_id into that field, or does the system care if it ties to a contact record?

Our counts on donate.wiki are continuing to match Silverpop's data very closely. I'm ready to pull the trigger and turn off click tracking in our appeals!

It does seem like the tracking isn't working for the Remind emails, probably because we're not passing a value for contact_id. I'm going to add the Silverpop recipient_id in its place and hope that works...

Hi @CCogdill_WMF! It's great to hear the numbers are pretty convergent. Looking forward to being able to avoid those sketchy non-https redirect hops.

Right now, we're only using contact_id to make sure each recipient is only counted once, but it would be nice to be able to pass it through the pipeline and not duplicate the contact in Civi. To make it easier to tell real contact IDs from silverpop recipient IDs, would it be possible to add a couple of letters to the front of the silverpop IDs? So for the reminders the link's parameters would be something like "?contact_id=SP5551212&utm_source=XX123456". The contact_id field in the database can hold up to 32 arbitrary characters.

That sounds like a good suggestion to me, we can definitely do that. I will double check recipient IDs don't exceed 30 characters.

@Ejegg is there any way to expand the field character max to 255? We're not
positive we can guarantee < 31.

Sure, that shouldn't be a problem to change now, since the table isn't too huge yet. Will have a patch up shortly.

Change 271041 had a related patch set uploaded (by Ejegg):
Widen contact_id column

https://gerrit.wikimedia.org/r/271041

Change 271041 merged by Cdentinger:
Widen contact_id column

https://gerrit.wikimedia.org/r/271041

Ejegg moved this task from Pending Deployment to Done on the Fundraising Sprint Dirt Farming board.

OK @CCogdill_WMF, the contact_id column is 255 characters wide now! I'll close this ticket, but feel free to re-open if anything acts unexpectedly in the future.

Woo, I really can't thank you enough, @Ejegg! This is wonderful.

I have one last follow-up question which may merit a new (lower priority)
task. We send occasional newsletters which link to the blog and other
wikimedia domains. Because these emails don't ask for donations, clicks are
the only real metric we have for assessing the quality of the content.

There isn't an easy way to use this system to track clicks on other
domains, right? Would we have to set up redirects for each link in a
newsletter email?

Oops @Ejegg, I'm rereading the edited task description and see this:

Q: Can we find a way to track clicks to links that resolve outside of donate.wiki (i.e. blog links)?
A: Yes, this is already possible with just utm_source.

So if I just add utm_source to a blog.wikimedia.org url, the data will be in the pgehres database?