Page MenuHomePhabricator

Remove postal code and longitude / latitude from geocoded data object on webrequest data
Closed, ResolvedPublic

Description

Remove postal code and longitude / latitude from geocoded data object on webrequest data. The fields just eat space and are not very precise coming from the IP.

see: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/GetGeoDataUDF.java

Event Timeline

Ottomata moved this task from Incoming to Data Quality on the Analytics board.

longitude / latitude in geocoded_data object in webrequest is quite imprecise (as it comes from the IP) so it is a field that at this time is just eating space and not very useful. CC-ing Product-Analytics to get feedback but we are thinking of removing it entirely

Nuria renamed this task from Remove postal code and longitude / latitude from geocoded data object to Remove postal code and longitude / latitude from geocoded data object on webrequest data.Nov 27 2019, 8:43 PM
Nuria updated the task description. (Show Details)
Nuria added a subscriber: lexnasser.
Nuria raised the priority of this task from Low to Needs Triage.Nov 27 2019, 8:55 PM
Nuria moved this task from Data Quality to Mentoring on the Analytics board.
Nuria added subscribers: razzi, JAllemandou.

cc @razzi

Pinging @JAllemandou in case he can think of any reason why we should leave these fields, giving precision.

I quickly reviewed refinery and refine-to-druid jobs and found that none uses either postqal-code not lat/long. I think we're safe to remove them :)

Moving to kanban and @razzi to work on this.

Change 635085 had a related patch set uploaded (by Razzi; owner: Razzi):
[analytics/refinery/source@master] Remove postal code, latitude, and longitude from geodata

https://gerrit.wikimedia.org/r/635085

@Nuria Product Analytics hasn't used this data (except for once, maybe, a few years ago), and we think it's reasonable to remove it.

Change 635085 merged by Razzi:
[analytics/refinery/source@master] Remove postal code, latitude, and longitude from geodata

https://gerrit.wikimedia.org/r/635085

Change 635352 had a related patch set uploaded (by Razzi; owner: Razzi):
[analytics/refinery@master] oozie: update webrequest/load hive jar version

https://gerrit.wikimedia.org/r/635352

Change 635352 merged by Mforns:
[analytics/refinery@master] oozie: update webrequest/load hive jar version

https://gerrit.wikimedia.org/r/635352

Moving this to done since everything seems already deployed.