Create a new user defined function (UDF) for use in Hive queries that can generate a network classification ("internal", "external", "labs") for a given IPv4 or IPv6 address. This UDF will be used in ad-hoc queries and Hive scripts to broadly categorize the origin of Action API requests. It may also be found to be useful for other reporting needs.
Description
Description
Details
Details
Event Timeline
Comment Actions
Change 253045 had a related patch set uploaded (by BryanDavis):
Rename ipAddressMatcherCache -> trustedProxiesCache
Comment Actions
Change 253046 had a related patch set uploaded (by BryanDavis):
Add UDF for network origin
Comment Actions
Change 254170 had a related patch set uploaded (by BryanDavis):
Rename network_origin UDF partitions
Comment Actions
The new UDF can be used from my homedir on stat1002 until such time as the shared refinery jars are updated:
ADD JAR /home/bd808/projects/analytics-refinery-source/refinery-core/target/refinery-core-0.0.23-SNAPSHOT.jar; ADD JAR /home/bd808/projects/analytics-refinery-source/refinery-hive/target/refinery-hive-0.0.23-SNAPSHOT.jar; CREATE TEMPORARY FUNCTION network_origin as 'org.wikimedia.analytics.refinery.hive.NetworkOriginUDF'; USE wmf; SELECT month, day, network_origin(client_ip) as network, COUNT(*) as hits FROM webrequest WHERE year = 2015 AND month = 11 AND day = 2 GROUP BY month, day, network_origin(client_ip) LIMIT 1000;
Comment Actions
Function is in /srv/deployment/analytics/refinery/artifacts/refinery-hive.jar on stat1002
ADD JAR /srv/deployment/analytics/refinery/artifacts/refinery-hive.jar; CREATE TEMPORARY FUNCTION network_origin as 'org.wikimedia.analytics.refinery.hive.NetworkOriginUDF' SELECT month, day, network_origin(client_ip) as network, COUNT(*) as hits FROM webrequest WHERE year = 2015 AND month = 11 AND day = 2 GROUP BY month, day, network_origin(client_ip) LIMIT 1000;