Page MenuHomePhabricator

Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest {slug}
Closed, ResolvedPublic

Description

In case of only one IP leading to a certain country/region/city data point (or user_agent_map data point), it would be possible with access to webrequest logs to re-link some private data to a full reading history (in case we don't delete aggregated pageviews). We want to know how often this seems to be at risk, to possibly mitigate the risk.

Event Timeline

JAllemandou raised the priority of this task from to Needs Triage.
JAllemandou updated the task description. (Show Details)
JAllemandou added a project: Analytics-Backlog.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 12 2015, 4:51 PM
kevinator triaged this task as Normal priority.Aug 20 2015, 5:28 PM
kevinator moved this task from Incoming to Prioritized on the Analytics-Backlog board.
kevinator set Security to None.
kevinator raised the priority of this task from Normal to High.Sep 16 2015, 4:37 PM
Nuria added a subscriber: Nuria.Sep 16 2015, 4:37 PM
Nuria claimed this task.Sep 16 2015, 5:29 PM
Nuria added a project: Analytics-Backlog.
Nuria added a comment.Sep 17 2015, 7:12 PM

The goal of this ticket is to establish how easy is to link records in pageview_hourly to a particular identity (that is, an IP) IF you were to gain access to the cluster and you had at your disposal both pageview_hourly and webrequest tables.

The webrequest table stores IPs if for a short time, only 2 months. Pageview_hourly has geographical information associated to pageviews but no IPs. Pageview_hourly data is stored forever.

Can we link records on pageview_hourly (and thus a viewing session) to records on webrequest using the geographical information to join those two tables?

Nuria added a comment.Sep 17 2015, 8:49 PM

More info from IRC conversation:

1:38 PM <csteipp> nuria: So my specific example when talking with Joseph was, if we have pageviews for a very unique geographic area, and there's only one person in the geography, then someone with access to our IP's and can find one that geolocates there, they have the real id. I'd like to understand how common that is.
1:38 PM <csteipp> nuria: Yes pageviews + geographic is the worst. Very unique UserAgent would be the same issue too.
1:38 PM <csteipp> (iirc)
1:39 PM <nuria> csteipp: understanding that IP when it comes to #G connections is pretty meaningless right?
1:39 PM <nuria> csteipp: cause we are going to find tons of matches for regions in which mobile users share the same IP
1:39 PM <csteipp> #G connections?
1:39 PM <nuria> csteipp: sorry 3G
1:40 PM <csteipp> nuria: that's carrier dependent. Some give IP per device (especially ones that are ipv6), others nat.
1:43 PM <nuria> csteipp: that is less common than global IPs though on my experience, what I wanted to point out is that in mobile IP most of the time would not imply an individual , so: "will we find tons of records on the same city (even a small one) with the same IP? " Yes, we shall.
1:43 PM <csteipp> real id => reading history, not the reverse
1:44 PM <nuria> csteipp: i will spend some time doing queries but -unless I am missing something - the false positives will be "all IOS8 users on verizon on toronto"
1:44 PM <csteipp> nuria: Right. The question is do we have pageviews with very small (single) numbers of IP's that geolocate there.
1:44 PM <csteipp> Not large numbers
csteipp> nuria: Cool. So just to be clear, running querries where we have geographies with *small* numbers of unique IP's, right?
1:46 PM <csteipp> I think Joseph and I discussed >10
1:46 PM <csteipp> <10, I mean :)

Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.Sep 21 2015, 3:39 PM
Milimetric renamed this task from Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest to Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest {slug}.Sep 24 2015, 3:48 PM
Nuria moved this task from In Progress to Done on the Analytics-Kanban board.Sep 29 2015, 3:32 PM
sbassett moved this task from Backlog to Done on the Privacy board.Wed, Oct 16, 5:42 PM