Page MenuHomePhabricator

Allow opting out from logging some of the default EventLogging fields on a schema-by-schema basis
Closed, DeclinedPublic

Description

Some of the default EventLogging fields are not needed in some of the schemas, and they can be privacy-invasive and/or take up a lot of space in the database and make the log message large (cf. T91347), the prime culprits being userAgent and clientIp. There should be a way to opt out from logging those.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added a subscriber: Tgr.

As a stakeholder of the EventLogging service provided by Analytics, I request that they decline this task.

By definition any collection of data about anyone for any purpose is "privacy-invasive". We balance that privacy invasion against our desires to understand our users and how they use our sites for the purposes of serving their needs better. I agree that providing a general-purpose opt-out for data collection is important to respect a user's desire to not have data about them collected. But I don't agree with providing many such opt-outs for many bits of the data. The complexity of user interface and implementation that results is not worth the reward. I would rather see Analytics focus its effort on a general-purpose opt-out than a specific one.

As a stakeholder of the EventLogging service provided by Analytics, I request that they decline this task.

By definition any collection of data about anyone for any purpose is "privacy-invasive". We balance that privacy invasion against our desires to understand our users and how they use our sites for the purposes of serving their needs better. I agree that providing a general-purpose opt-out for data collection is important to respect a user's desire to not have data about them collected. But I don't agree with providing many such opt-outs for many bits of the data. The complexity of user interface and implementation that results is not worth the reward. I would rather see Analytics focus its effort on a general-purpose opt-out than a specific one.

I think @Tgr is talking about per-schema opt-out in the software, rather than a user choice. If I understood him correctly, I support that.

I think @Tgr is talking about per-schema opt-out in the software, rather than a user choice. If I understood him correctly, I support that.

Fair point. If so, I agree that that seems reasonable.

Indeed, I was thinking of a way to disable IP/useragent collection in the schema configuration (or logEvent call or whatever works), just worded it poorly.

For user opt-out, AFAIK we disable EventLogging when the Do Not Track header is set, and that seems good enough to me.

ClientIp is always encrypted and takes no space so I do not think is an issue. Also, user-agent and IP are deleted after 90 days per our privacy guidelines.

Milimetric triaged this task as Medium priority.Sep 3 2015, 5:13 PM
Milimetric moved this task from Incoming to Backlog on the Analytics-Backlog board.

For schemas where tracking a user across log entries isn't needed, I think

this is a very good idea. The more schemas we have, the more information
about the hashed IP we're storing, and in aggregate that allows us to get
closer to reidentification.

So yes, I really think this should be supported.

Actually it looks like many or almost all schemas have already involuntarily opted out of logging valid clientIPs since more than five months, and nobody noticed ;) T119144

Deskana renamed this task from Opt-out from logging some of the default EventLogging fields to Allow opting out from logging some of the default EventLogging fields on a schema-by-schema basis.Jan 12 2016, 7:42 PM

Reworded the title to try to capture the intent of the task based on the above discussion; feel free to modify it further.

IP has been dropped unconditionally in T126366/T128407.

We'll preprocess the user-agent as part of T121550, so that should help with this as well.

User agent isn't raw anymore and the IP has been dropped so this doesn't pertain anymore.

We support events now that don't have the schema capsule via EventBus.