CentralNoticeImpression refined impressionEventSampleRate is int instead of double
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Milimetric
	Feb 26 2019, 2:47 AM

Description

When we refine EventLogging schemas for insertion into Hive tables, we infer the type of fields. In the case of impressionEventSampleRate we inferred integer but in the schema it's set to "number". In the future, we will use the schema directly, but for now we're just monitoring where inferring goes wrong. All rows have a value of 0 for this property. In the code it looks like it was set to 0.01. We could alter the table and correct the data by always setting it to 0.01. Let us know if that's the right thing to do or if there's any other nuance we're missing.

Related Objects
Search...

Status	Assigned	Task
Resolved	mforns	T214384 [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema
Resolved	Milimetric	T216771 [Bug] Type mismatch for a few other schemas
Resolved	Milimetric	T217109 CentralNoticeImpression refined impressionEventSampleRate is int instead of double

Event Timeline

Milimetric created this task.Feb 26 2019, 2:47 AM

Milimetric triaged this task as Medium priority.Feb 28 2019, 5:43 PM

Milimetric moved this task from Incoming to Data Quality on the Analytics board.

Milimetric removed a project: Analytics-Kanban.

• DStrine moved this task from Triage to Sprint +3 on the Fundraising-Backlog board.Mar 4 2019, 9:03 PM

@DStrine can you let us know what you'd like to do here? It's not technically complicated, but it's a little time sensitive in case you want to look at raw EventLogging data (which gets dropped after 90 days without a whitelist policy)

I'm looping in @Ejegg and @AndyRussG for comment.

Hi! Thanks so much!!!

Here's what Hive said about the event field in the event/centralnoticeimpression table:

event   struct<anonymous:boolean,banner:string,bannerCategory:string,bucket:bigint,campaign:string,
campaignCategory:string,campaignCategoryUsesLegacy:boolean,country:string,db:string,debug:boolean,
device:string,impressionEventSampleRate:bigint,project:string,randombanner:double,randomcampaign:double,
recordImpressionSampleRate:double,result:string,status:string,statusCode:string,uselang:string,
reason:string,bannerCanceledReason:string,bannersNotGuaranteedToDisplay:boolean,debugInfo:string,
errorMsg:string,alterFunctionMissing:boolean,region:string>

I don't see anything else problematic, other than impressionEventSampleRate should be double. bucket could be tinyint if you wish.

The data is currently not in use, no issues currently about the sunsetting data. If it's very little to go back and change the 0's to 0.01, that might be useful, so we can compare the data from the old pipeline (that this will replace) to this new data when we get ready to switch. However, it's also fine to just have that field set correctly going forward, too.

Thanks again!!!!

Let's see, this data comes from eventlogging, in order for it to be useful we need to make sure FR-tech has switched to eventlogging being the main way by which impressions are computed, has that happened?

In T217109#5086615, @Nuria wrote:

Let's see, this data comes from eventlogging, in order for it to be useful we need to make sure FR-tech has switched to eventlogging being the main way by which impressions are computed, has that happened?

No, that hasn't happened yet. The events have been left on at 0.01% sample rate (hope that's OK) but the data is not being used yet. Work to finish the new pipeline should continue soon, then we'll compare the data form both sources, and switch in the new pipeline once it's confirmed to be all good.

The events have been left on at 0.01% sample rate (hope that's OK)

Yes, of course. Once you are ready to switch pipelines let us know.

The easiest thing to do is to delete the old data and change the schema going forward. Let me know if this is ok to do, @AndyRussG. If not, I can do a more painful copy/rename/rename thing to keep the old data.

ping @AndyRussG, can you confirm that it's ok to delete the old data?

Sorry for the delay...!! I thinks so... @Seddon, ok with you also that the existing data in Hive obtained from the new pipeline be deleted?

Seems like the right thing so that we can move forward!

@Addshore & @Verena: Be aware there may be some disruption to the CN hive data.

• Jseddon added subscribers: Verena, Addshore.Apr 15 2019, 3:40 PM

Milimetric claimed this task.May 2 2019, 2:31 PM

Milimetric added a project: Analytics-Kanban.

Milimetric moved this task from Next Up to In Progress on the Analytics-Kanban board.

Ok, done, and now I'm seeing "impressionEventSampleRate":0.01 in the data, so all is good going forward. Thanks for getting back to us and helping move this forward.

Milimetric closed this task as Resolved.May 2 2019, 7:53 PM

Milimetric moved this task from In Progress to Done on the Analytics-Kanban board.

for the record, we made a teeny mistake the first time we did this and the useragent field had a bad schema. So we redid it today and data and schema both look fine now.

CentralNoticeImpression refined impressionEventSampleRate is int instead of doubleClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

CentralNoticeImpression refined impressionEventSampleRate is int instead of double
Closed, ResolvedPublic
Actions

Related Objects
Search...