|eventlogging||master||+123 -13||Prefer event capsule defined fields over those found in raw log line|
|Resolved||Ottomata||T159170 Sunset MySQL data store for eventlogging|
|Resolved||Ottomata||T162610 Implement EventLogging Hive refinement|
|Resolved||Ottomata||T178440 Refine should parse user agent field as it is done on refinery pipeline|
|Resolved||Ottomata||T179625 Resolve EventCapsule / MySQL / Hive schema discrepancies|
Since the current code knows nothing about schemas ahead of time, this might be a little difficult.
We could add a pluggable transform step (function) that would map the JSON DataFrame before it is converted to a hive schema and 'refined'.
Perhaps! We could detect the existence of a JSON string in a string field, and auto-parse it as a JSON sub object and use a struct. Since EventLogging Analytics auto converts userAgent into a JSON string, this would cause the userAgent field in Hive to be transformed into a struct.
Or, we could just leave this as is. The data in userAgent is sanitized and is queryable via Hive JSON function.
hive (otto_json_refine_test)> select json_tuple(userAgent, 'browser_family', 'browser_major') from centralnoticebannerhistory where year='2017' and month='8' and day='11' limit 4; OK c0 c1 Chrome 60 Chrome 59 Chrome 60 Safari 10
Doesn't seem that json-tuple is that friendly on a where clause right? I just cannot get it to work in something like:
select json_tuple(userAgent, 'browser_family', 'browser_major') from centralnoticebannerhistory h, lateral view json_tuple(userAgent, 'browser_family', 'browser_major) as ua bf, bm where year='2017' and month='8' and day='11' and bm like '55' limit 4
Yeah a little nasty.
It'd be better to do this in EventLogging processor or something though. We're already parsing the user agent into a JSON string there. The only reason we didn't turn it into an object was for compatibility with existing MySQL table varchar userAgent fields.
I'm already going to have to do some custom tranformation to solve T179540, perhaps we can just take the userAgent dict and serialize that to a JSON string before inserting into MySQL too.