When we refine EventLogging schemas for insertion into Hive tables, we infer the type of fields. In the case of impressionEventSampleRate we inferred integer but in the schema it's set to "number". In the future, we will use the schema directly, but for now we're just monitoring where inferring goes wrong. All rows have a value of 0 for this property. In the code it looks like it was set to 0.01. We could alter the table and correct the data by always setting it to 0.01. Let us know if that's the right thing to do or if there's any other nuance we're missing.
|Resolved||mforns||T214384 [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema|
|Resolved||Milimetric||T216771 [Bug] Type mismatch for a few other schemas|
|Resolved||Milimetric||T217109 CentralNoticeImpression refined impressionEventSampleRate is int instead of double|
Hi! Thanks so much!!!
Here's what Hive said about the event field in the event/centralnoticeimpression table:
event struct<anonymous:boolean,banner:string,bannerCategory:string,bucket:bigint,campaign:string, campaignCategory:string,campaignCategoryUsesLegacy:boolean,country:string,db:string,debug:boolean, device:string,impressionEventSampleRate:bigint,project:string,randombanner:double,randomcampaign:double, recordImpressionSampleRate:double,result:string,status:string,statusCode:string,uselang:string, reason:string,bannerCanceledReason:string,bannersNotGuaranteedToDisplay:boolean,debugInfo:string, errorMsg:string,alterFunctionMissing:boolean,region:string>
I don't see anything else problematic, other than impressionEventSampleRate should be double. bucket could be tinyint if you wish.
The data is currently not in use, no issues currently about the sunsetting data. If it's very little to go back and change the 0's to 0.01, that might be useful, so we can compare the data from the old pipeline (that this will replace) to this new data when we get ready to switch. However, it's also fine to just have that field set correctly going forward, too.
Let's see, this data comes from eventlogging, in order for it to be useful we need to make sure FR-tech has switched to eventlogging being the main way by which impressions are computed, has that happened?
No, that hasn't happened yet. The events have been left on at 0.01% sample rate (hope that's OK) but the data is not being used yet. Work to finish the new pipeline should continue soon, then we'll compare the data form both sources, and switch in the new pipeline once it's confirmed to be all good.