Use kafka-druid for realtime in addition to batch jobs (oozie) to ingest FR banner impression data.
From closed task T203669:
Some dimensions could be removed. Specifically, event_campaignCategoryUsesLegacy (can be easily determined from other data already in the event), event_result (legacy field that can also be derived from other data), and event_recordImpressionSampleRate (just the sample rate for the old call to beacon/impression) could all go. (We put them in the event just in case they're needed for debugging, but we can always get them via Hive.)
There's a small mistake in the calculation for the normalized count. It should use event_impressionEventSampleRate, which is the sample rate for these events, rather than event_recordImpressionSampleRate (usually not the same value). (Really nice that you were able to include that calculation in the pipeline, btw.)