Page MenuHomePhabricator

Record browser fingerprints instead of pure UA for checkuser
Open, MediumPublic

Description

This is a feature request opinion, not a task or anything else. This may not comply with the privacy statement or anything else.

I hope Wikimedia wikis uses JavaScript or something to record more user browser fingerprints (like canvas fingerprints, font fingerprints, resolution, API support, etc.), and provide the irreversible HASH IDs to CheckUser to compare different IP addresses but the same browser or device, rather than simply IP and user agent strings to confirm the same, the latter is more likely to be collided or forged and lead to unreliability.

An exact socks conclusion has caused controversy on technology and judgment in zhwiki: discussions in Chinese, discussions in Chinese and English.

Event Timeline

To be clear, fingerprints should only be sent when performing an action, not for viewing. Should be able to be cached and try to avoid forgery.

@YFdyh000 Thanks for this task. We have been thinking about this as well since IP addresses are deemed as personally-identifiable-information and we want to move away from exposing those to being able to do some sort of auto-comparison to find socks with the data we have.

Niharika renamed this task from Record and based on browser fingerprints instead of pure UA for checkuser to anti-harassment to Record browser fingerprints instead of pure UA for checkuser .Feb 18 2021, 12:58 PM

I would personally (as a user) would be extremely against this being implemented. Wikimedia/Wikipedia should not be a platform that should perform invasive JS based (font-support detection/Web-GL based) fingerprinting on it's users regardless of it's usecase (as opposed to passive fingerprinting via UA hints and basic api support detection which is what is being done/proposed untill now).

I would personally (as a user) would be extremely against this being implemented. Wikimedia/Wikipedia should not be a platform that should perform invasive JS based (font-support detection/Web-GL based) fingerprinting on it's users regardless of it's usecase (as opposed to passive fingerprinting via UA hints and basic api support detection which is what is being done/proposed untill now).

@Soda thanks for commenting on this. Would you mind elaborating further about what level of fingerprinting you think would be acceptable, and why? For example, there are fingerprinting techniques that can be done without JavaScript (explainer) that one could also argue are as invasive as JavaScript-based techniques.

I would personally (as a user) would be extremely against this being implemented. Wikimedia/Wikipedia should not be a platform that should perform invasive JS based (font-support detection/Web-GL based) fingerprinting on it's users regardless of it's usecase (as opposed to passive fingerprinting via UA hints and basic api support detection which is what is being done/proposed untill now).

@Soda thanks for commenting on this. Would you mind elaborating further about what level of fingerprinting you think would be acceptable, and why? For example, there are fingerprinting techniques that can be done without JavaScript (explainer) that one could also argue are as invasive as JavaScript-based techniques.

I think a good rule of thumb would be to consider any active fingerprinting method to be off-limits. Feature detection would be okay since this information is something that the browser explicitly provides, but actively (ab)using the web-gl API or using font detection techniques (which is one of the techniques that the no-js demo uses) or cache supercookies would be a definite no.

Another more user centric rule of thumb (that will be more difficult to quantify) would be to see how unique the resulting fingerprinting is (calculating the entrophy of a fingerprinting). If we are able to semi-reliably uniquely identify every single person/device in the world, we are clearly crossing some boundaries.

I would personally (as a user) would be extremely against this being implemented. Wikimedia/Wikipedia should not be a platform that should perform invasive JS based (font-support detection/Web-GL based) fingerprinting on it's users regardless of it's usecase (as opposed to passive fingerprinting via UA hints and basic api support detection which is what is being done/proposed untill now).

@Soda thanks for commenting on this. Would you mind elaborating further about what level of fingerprinting you think would be acceptable, and why? For example, there are fingerprinting techniques that can be done without JavaScript (explainer) that one could also argue are as invasive as JavaScript-based techniques.

I think a good rule of thumb would be to consider any active fingerprinting method to be off-limits. Feature detection would be okay since this information is something that the browser explicitly provides, but actively (ab)using the web-gl API or using font detection techniques (which is one of the techniques that the no-js demo uses) or cache supercookies would be a definite no.

Another more user centric rule of thumb (that will be more difficult to quantify) would be to see how unique the resulting fingerprinting is (calculating the entrophy of a fingerprinting). If we are able to semi-reliably uniquely identify every single person/device in the world, we are clearly crossing some boundaries.

Thanks for your comments, @Soda. If we were to pursue fingerprinting as an alternative to reduced user agent string entropy, we'd need to think carefully about what we collect, how we store it, how long we retain it, and to what end it's used.

For scaled anti-abuse detection, prevention, and automated rollback, I think eventually we will need to think about building fingerprints of activity that we want to mitigate (CAPTCHAs, or enforced time delays, lower rate limits, blocks). I will try to come back to this task with some more thoughts on this.