Page MenuHomePhabricator

Investigate: How should tunnels be represented in the database
Closed, ResolvedPublic5 Estimated Story Points

Description

The tunnels attribute is an array of...tunnels. It's currently being written into the table as a blob of data. Please investigate (and presumably do as part of this ticket):

  • If we can write to the table in a more granular and specced manner (eg. if each tunnel should get its own row and then the ids are assigned to the actor)
  • How we can expect to pull that data back out when it's needed

Event Timeline

Niharika set the point value for this task to 5.Dec 20 2022, 5:41 PM

I wrote this as an investigation but looking at it again I think it makes sense to just go ahead and implement tunnels this way. We treat other arrays of data similarly and we might as well be consistent.

The object being written for each tunnel consists of the following keys:

  • operator: string (not required)
  • type: vpn/proxy (required)
  • anonymous: true/false (required)
  • entities: an array of IPs (not required)

If a tunnel does not have an operator, it will be an anonymous (or not) vpn or proxy. afaik we're not sure if this data will be used or not. In the interest of moving forward (and unblocking other bigger patches that need to go through), I'm going to make the call to only write tunnels with operators into the table for now and also skip the entities, since it's not immediately clear what the IP should be associated with (it's not the IP of the originating entity). When we're more sure of what we're going to do with the tunnel data, I think we can reconsider if knowing that an IP is using an additional unnamed tunnel/vpn will be useful. It's possible that information will be covered by another attribute.

[...] also skip the entities, since it's not immediately clear what the IP should be associated with (it's not the IP of the originating entity).

@STran I'm seeing in the data type documentation that the tunnel object has the keys:

  • anonymous
  • entries
  • exits (I notice this is never used in our small test dataset)
  • operator
  • type

Given that there are (potentially) entries and exits, is that useful after all?

Flagging for @Niharika, who is working on defining what data we want to show.

The last discussion I had with Prateek, we weren't even sure if we were going to show tunnels so I was conservative about what we chose to write, assuming it would be fairly trivial to update later. I noted the entries but couldn't correlate them to the IPs that were being recorded in the db from my file. Given that, I'm not sure if we're going to be revealing IPs like that and opted not to include them but Niharika might have a different opinion.

eg. User A's IP correlates to an IP recorded that has an entry point with IP B. Are we going to reveal IP B? Can checkusers do anything with an IP like that?

OK, makes sense - thanks @STran. I'll move this to Done and leave open for @Niharika to look over.

I'm fine with the current choice. We can re-evaluate once this is out to understand what the community deems useful information. Thanks for working on this task.