Page MenuHomePhabricator

Review banner history log data and confirm that it satisfies use cases
Closed, ResolvedPublic

Description

Look over the data collected by EventLogging and decide whether it covers Fundraising's needs for the banner history MVP.

We can iterate on this until the data has everything we need. The acceptance criterion for finishing this task is that the data is as complete as we can get it. At that point, we're ready to deploy the feature to real readers.

Event Timeline

awight assigned this task to ellery.
awight raised the priority of this task from to High.
awight updated the task description. (Show Details)

Here's a tidbit of the log from the beta cluster. Entries with the "r" property were randomly sampled, where "r" indicates the sample rate. Entries with the "i" property were generated when a user clicked on Donate--in that case, the log will always be sent. The value of the "i" property is the temporary banner history log ID that you'll use to correlate with donations. (More doc in the schema itself.)

awight renamed this task from Spike: Review banner history log data and confirm that it satisfies use cases to Review banner history log data and confirm that it satisfies use cases.Sep 17 2015, 10:12 PM
awight updated the task description. (Show Details)
awight set Security to None.

Do you want the history log to include banner impressions from other campaigns?

Do you want a record of pageviews with no banner impression?

@awight Having page views would make the data quite a bit more interesting. Unfortunately, if we can only send back a log with 10 or so items, then these would "wash out" the more valuable banner data log items. An efficient way to get the most interesting page view data would be to just have a count for the number page views between impressions. This count could even be an element in the log item (e.g. views_until_next_impression). This does not need to be in the MVP but would be a valuable addition.

@ellery @awight If the additional log entries would be to count pageviews or record banner displays in the same segment of users as is already targeted by a campaign with banner history enabled, then I think I have a solution: run a bit of code for users that were targeted by the campaign but that weren't included in it due to allocation blocks and random selection thereof.

This would work for the following scenarios:

  • Users are targeted by a throttled low-level campaign. As currently happens, users targeted by such a campaign randomly get the campaign for n% of pageviews. This would let us run code for the remaining percent to count the intervening pageviews.
  • Users are targeted by two or more campaigns at once, one of which is a fundraising campaign with banner history enabled. In this case, we'd also run code when the user is randomly selected to get any of the non-fundraising campaigns. So we'd also be able to put stuff in the banner history log at that time, too.

What this approach would not do is record pageviews or banner displays for any users not targeted by the fundraising campaign (on country, language, project, device or logged-in status criteria).

Thoughts? :)

@ellery here're my main take-aways from our e-mail discussion from last week, as they relate to this task.

  • The main unit of analysis is pageview-in-the-campaign (erstwhile "impression"). The data is fine for analysis on that basis.
  • By sending a log ID for every time a banner history log is sent via EventLogging, and never sending a log more than once per pageview, we cleanly separate pageviews that lead to donations and those that don't. (This is fixed and now deployed on production.)
  • We'll leave data about banners and pageviews outside the campaign for a future iteration.

Given the above, it seems maybe we can mark this task resolved? What do you think? Or should we wait until you've seen data on the civi side, or from an actual campaign?

Thanks!!

Thanks for looking at this!

@AndyRussG is the data in HDFS on the analytics cluster now as well?

@ellery: I'm not sure... I don't know how to query it on HDFS. But I can see the events successfully zooming through Kafka like so:

$ kafkacat -o beginning -t eventlogging_CentralNoticeBannerHistory -b kafka1012.eqiad.wmnet:9092