This is an ask from reading team
We could pre-parse UA data using ua parser python library,
Original request was to have a is_spider column similar to our hive parsing.
This is an ask from reading team
We could pre-parse UA data using ua parser python library,
Original request was to have a is_spider column similar to our hive parsing.
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Add singleton capactiy to UAParser | analytics/refinery/source | master | +19 -4 |
The easiest way to add user-agent refinement to eventlogging would be to use the refinery code through hive or spark on eventlogging logged into hadoop.
Adding this column to the capsule requires work on the EL mysql database end of things which is having a lot of issues right now (as a new column needs to be added to every single table) so this is not likely to get done in the near term.
Just a thought Will be much better to re-use hadoop logic for this as we have the code to parse bots ready to go.
Change 311127 had a related patch set uploaded (by Joal):
Add singleton capactiy to UAParser
Update: we will be replacing the data held by the user agent column with the parsed version . Resolving this ticket as duplicate.