Related Objects
Related Objects
Event Timeline
Comment Actions
When this is done, the wmf.webrequest Hive table will have the following new fields:
obtained using ClientIPUDF, or IpUtil methods:
- client_ip
Obtained using GeocodedDataUDF, or Geocode methods by geocoding the client_ip:
- continent
- country_code
- country
- subdivision
- city
- postal_code
- latitude
- longitude
- timezone
The geocoded data may make sense to keep in a map field type rather than top level fields, I am not sure.
You will need to:
- alter the (Parquet formatted) wmf.webrequest table in such a way that previous data that does not have these fields still works in select statements (default data? is this even possible?)
- In refinery repository, modify the create_webrequest_table.hql file to reflect the schema changes.
- In refinery repository, modify oozie/webrequest/refine/refine_webrequest.hql to use the UDFs to populate the new fields.
Once the changes have been reviewed and merged, we will re-submit the oozie job to populate the new data when it runs.
Comment Actions
Not sure, but this may be helpful once we upgrade (hopefully today):
https://issues.apache.org/jira/browse/HIVE-6456