When this is done, the wmf.webrequest Hive table will have the following new fields:
obtained using ClientIPUDF, or IpUtil methods:
Obtained using GeocodedDataUDF, or Geocode methods by geocoding the client_ip:
The geocoded data may make sense to keep in a map field type rather than top level fields, I am not sure.
You will need to:
- alter the (Parquet formatted) wmf.webrequest table in such a way that previous data that does not have these fields still works in select statements (default data? is this even possible?)
- In refinery repository, modify the create_webrequest_table.hql file to reflect the schema changes.
- In refinery repository, modify oozie/webrequest/refine/refine_webrequest.hql to use the UDFs to populate the new fields.
Once the changes have been reviewed and merged, we will re-submit the oozie job to populate the new data when it runs.