Page MenuHomePhabricator

Move FR banner-impression jobs to events
Open, NormalPublic

Description

Use kafka-druid for realtime in addition to batch jobs (oozie) to ingest FR banner impression data.
From closed task T203669:

Some dimensions could be removed. Specifically, event_campaignCategoryUsesLegacy (can be easily determined from other data already in the event), event_result (legacy field that can also be derived from other data), and event_recordImpressionSampleRate (just the sample rate for the old call to beacon/impression) could all go. (We put them in the event just in case they're needed for debugging, but we can always get them via Hive.)
There's a small mistake in the calculation for the normalized count. It should use event_impressionEventSampleRate, which is the sample rate for these events, rather than event_recordImpressionSampleRate (usually not the same value). (Really nice that you were able to include that calculation in the pipeline, btw.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 8 2019, 5:09 PM

Pinging @DStrine
Waiting for FR tech input to productionize this.

Nuria triaged this task as Normal priority.

I'm not able to totally understand the impact and choices needed here

If this is a cary-over task for @Seddon 's request on T203669 then we need him to respond here.

If this is related to the rest of the data pipeline for CN and has Advancement implications then I could use @AndyRussG or @Ejegg to respond.

Just looping back in on this ticket @JAllemandou as to what is needed from us.

Nuria added a comment.Apr 3 2019, 4:46 PM

@Jseddon : would you be so kind as to please describe the use case or functionality you are looking for?

The goal is to be able to return to near realtime banner impression data within turnillo to achieve three purposes:

  • Verification of successful delivery of campaigns
  • As a diagnostic tool to monitor and diagnose issues with live campaigns and centralnotice
  • Easy retrieval of banner impression data
Nuria added a comment.Apr 3 2019, 6:01 PM

Question for FR-development team: where does this pipeline of data come from (eventlogging we hope, but double cheking)

pinging @Ejegg and @AndyRussG directly

Nuria added a comment.EditedApr 4 2019, 11:11 PM

Per @AndyRussG's reply in: https://phabricator.wikimedia.org/T217109 the FR-tech team has not yet moved to the eventlogging pipeline for events to measure impressions, once that happens we can easily product-ionize a real-time ingestion job from the eventlogging topic on kafka.

Nuria renamed this task from Move FR banner-impression jobs to events (lambda) to Move FR banner-impression jobs to events.Apr 4 2019, 11:11 PM