Page MenuHomePhabricator

Parse user agents in navtiming instead of relying on eventlogging to do it
Closed, ResolvedPublic

Description

We're slowly working on migrating Eventlogging legacy streams to Event Platform. EventGate does not parse user agents, so the stream data will no longer have a parsed userAgent field. The Hive tables have this field populated by Refine, but the streaming data in Kafka will not have it.

navtiming uses the parsed userAgent. navtiming should copy in the parse_ua function from eventlogging and parse the ua itself there out of http.request_headers['user-agent'] if it is not set in userAgent. Once it does that, we can migrate NavigationTiming to Event Platform.

Event Timeline

Ah, I see that it does use that library 🙂

@Ottomata were you previously taking care of updates for python-ua-parser's Wikimedia deb package? If so, will you keep doing it after legacy EventLogging streams are phased out?

Also I'm not sure where http.request_headers is supposed to come from in the context of navtiming. We pull events from kafka: https://github.com/wikimedia/performance-navtiming/blob/master/navtiming/__init__.py#L752-L768 Where is the http object coming from?

Gilles triaged this task as Medium priority.Aug 25 2020, 3:55 PM

@Nuria I just want to know if you'll keep taking care of those updates for the Python library, or if that's something we'll have to do, if as a result of your migration we become the only users of it in navtiming.

were you previously taking care of updates for python-ua-parser's Wikimedia deb package? If so, will you keep doing it after legacy EventLogging streams are phased out?

Hm, I don't think we will. We're moving away from using .deb packages to deploy python dependencies I think. You'll probably either have to update the .deb if you need it, or find a way to deploy navtiming with pip dependencies? We do this now with wheels and scap, but a better approach would probably be to deploy an image in k8s.

Also I'm not sure where http.request_headers is supposed to come from in the context of navtiming.

It will be part of the event schema once it is migrated to Event Platform, and eventgate-wikimedia will automatically set that field if it isn't set and it is present in the event's schema.

See e.g. https://schema.wikimedia.org/repositories//secondary/jsonschema/analytics/legacy/searchsatisfaction/1.1.0.yaml and https://gerrit.wikimedia.org/r/plugins/gitiles/eventgate-wikimedia/+/refs/heads/master/eventgate-wikimedia.js#361

@Ottomata So I should be looking for it in meta['event']['http'] where meta is the json object we currently pull from kafka?

Hm, no, if meta is the variable that contains the full event object (including the 'capsule', which we've gotten rid of), it will just be add meta['http']. Check out the JSONScema examples at the bottom of https://schema.wikimedia.org/repositories//secondary/jsonschema/analytics/legacy/searchsatisfaction/latest

Change 622531 had a related patch set uploaded (by Gilles; owner: Gilles):
[performance/navtiming@master] Handle UA parsing for EventGate compatibility

https://gerrit.wikimedia.org/r/622531

Change 629436 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] webperf: new python-ua-parser navtiming dependency

https://gerrit.wikimedia.org/r/629436

Change 629436 abandoned by Dave Pifke:
[operations/puppet@production] webperf: new python-ua-parser navtiming dependency

Reason:
Abandoning in favor of I49d63a919e6dca709f3afd9078e11e0c26d92c8b instead.

https://gerrit.wikimedia.org/r/629436

Change 622531 merged by jenkins-bot:
[performance/navtiming@master] Handle UA parsing for EventGate compatibility

https://gerrit.wikimedia.org/r/622531

Mentioned in SAL (#wikimedia-operations) [2020-11-24T09:03:42Z] <gilles@deploy1001> Started deploy [performance/navtiming@ba6cd0d]: T260580 Parse user agents in navtiming instead of relying on eventlogging to do it

Mentioned in SAL (#wikimedia-operations) [2020-11-24T09:03:52Z] <gilles@deploy1001> Finished deploy [performance/navtiming@ba6cd0d]: T260580 Parse user agents in navtiming instead of relying on eventlogging to do it (duration: 00m 05s)