Page MenuHomePhabricator

Augment Hive event data with normalized host info from meta.domain
Closed, ResolvedPublic

Description

We should add a Refine transform step to add normalized host info if meta.domain (and/or EventCapsule's webHost?) is set.

Event Timeline

Ottomata renamed this task from Augment event data with normalized host info from meta.domain to Augment Hive event data with normalized host info from meta.domain .Apr 28 2020, 6:50 PM
Milimetric moved this task from Incoming to Event Platform on the Analytics board.

there's already code for this in pageviews (the UDF that determines the project given a webhost).

Ya we just need to add a Refine transform function for this.

Should we, once this is done, remove code that does this at query time?
I.e. the session length intermediate table query using a UDF to normalize the meta.domain?

Change 705021 had a related patch set uploaded (by Mholloway; author: Michael Holloway):

[analytics/refinery/source@master] Add Refine transform function to add normalized host

https://gerrit.wikimedia.org/r/705021

I gave this a try for 10% time today.

Change 705021 merged by Ottomata:

[analytics/refinery/source@master] Add Refine transform function to add normalized host

https://gerrit.wikimedia.org/r/705021

Change 709810 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] refine - bump refinery version to pick up normalized_host transform

https://gerrit.wikimedia.org/r/709810

Change 709810 merged by Ottomata:

[operations/puppet@production] refine - bump refinery version to pick up normalized_host transform

https://gerrit.wikimedia.org/r/709810

Mentioned in SAL (#wikimedia-analytics) [2021-08-03T19:23:18Z] <ottomata> bump Refine to refinery version 0.1.16 to pick up normalized_host transform - now all event tables will have a new normalized_host field - T251320