Page MenuHomePhabricator

Timeboxed investigation into browser fingerprinting for anti-abuse report to WMF Board
Closed, ResolvedPublic2 Estimated Story PointsJan 19 2019

Assigned To
Authored By
TBolliger
Jan 9 2019, 10:47 PM
Referenced Files
None
Tokens
"Dislike" token, awarded by ToBeFree."Orange Medal" token, awarded by Aklapper."The World Burns" token, awarded by Krenair.

Description

As part of our work on device blocking and features to mitigate long-term abuse we want to look into third party libraries that we could use to use to compute and hash a fingerprint. This hash would then need to be stored on WMF servers for use in IP/IP range blocks, or potentially just as fingerprint blocks.

Timebox to one day to investigate and write-up a summary of options (libraries, services, etc.) and a rough technical implementation here on this task. Plus include analysis on the effectiveness of these libraries.

Please note that this investigation is due to a request from the WMF Board to look into anti-abuse. We are not committing to building this and are extremely aware that this project has some severe privacy implications.

Details

Due Date
Jan 19 2019, 7:00 AM

Event Timeline

This is sounding all kinds of dodgy. You're not ruling out introducing closed-source software? You're building a new feature behind closed doors? You're doing browser fingerprinting?

We're investigating all options so we can make an informed decision of build vs. buy vs. deprioritize.

TBolliger renamed this task from Investigate third-party libraries for browser fingerprinting to Investigate third-party libraries for browser fingerprinting for long-term-abuse project.Jan 10 2019, 12:04 AM

I wonder if Trust-and-Safety and/or the Security-Team have also ever thought about or even investigated this - might be worth asking them, just in case

I wonder if Trust-and-Safety and/or the Security-Team have also ever thought about or even investigated this - might be worth asking them, just in case

This is coming out of the Community Health cross-department-program which includes Legal and Trust and Safety. I'm also discussing with Analytics, Security, and Technology.

I'm aware this ticket looks cagey, but we're in the very early stages of an investigatory project — no actual software development is prioritized. Everyone knows that IP and IP range blocks are easily circumvented by sophisticated users and some long-term-abusers are patient enough to wait out a block expiration to continue their damage. We've been asked to look into possible solutions and make a recommendation to the WMF Board of Trustees.

Any advice would be appreciated on how to effectively document this work on Phabricator without revealing too much to the malicious actors.

Any advice would be appreciated on how to effectively document this work on Phabricator without revealing too much to the malicious actors.

Maybe a Space, however a task can only be in one Space (usually that's S1 which is the default and public Space) so it'd need a good number of members.

I'm not sure which malicious actors we're worried about here - The primary questions seem almost certainly ethical & technical (does the mechanism actually work, what is the false positive rate, what is the false negative rate, etc).

If anything ever gets put in production it almost certainly will be reverse engineered, if not outright available for download. Its unrealistic to assume it will stay secret long to sophisticated attackers (Which isn't necessarily a bad thing. Anti-abuse mechanisms don't have to be 100% reliable against all attackers to still be useful). With that in mind, I'm unsure if the benefits of secrecy are worth it here against the costs of appearing to have secret discussions on an extremely ethically fraught topic.

This is coming out of the Community Health cross-department-program which includes Legal and Trust and Safety. I'm also discussing with Analytics, Security, and Technology.

I'm aware this ticket looks cagey, but we're in the very early stages of an investigatory project — no actual software development is prioritized. Everyone knows that IP and IP range blocks are easily circumvented by sophisticated users and some long-term-abusers are patient enough to wait out a block expiration to continue their damage. We've been asked to look into possible solutions and make a recommendation to the WMF Board of Trustees.

The ticket is problematic because you've proposed throwing out one of our guiding principles in the second sentence, and then preventing others who might have valuable input from collaborating by using a closed platform. I would suggest following https://www.mediawiki.org/wiki/Technical_Collaboration_Guidance/Principles to work on this. Whether or not software development is prioritized, the important point is "once a product is being more than just discussed casually, it's time to start writing things down on a wiki". We all know that https://en.wikipedia.org/wiki/Security_through_obscurity doesn't work, so let's get it all in the open.

As a first step, I would suggest reading the past discussions that were had about this. This was a very controversial issue in the past, and I think people's opinions to this have only solidified over the past year. Even the mere *proposal* of unique tokens is problematic for some people - that we'd even consider compromising our principle of privacy.

Pardon the pile on appearance here @TBolliger. I very much appreciate it's difficult to balance transparency and efficacy in the short term but @Bawolff for me has hit the nail on the head. Any outcome from exploring and planning will be easy to reverse engineer, but also will need to be utilized and accepted in-the-least by the members of Security and acl*stewards. If anything can be done here it's going to be beneficial to be as transparent as possible as early as possible, though I totally understand if there are issues/constraints that are not public to begin with. I assume you'll be looking at techniques such as http://valve.github.io/fingerprintjs2/ and marrying to onwiki blocking and identity correlation. To that effect...

Any advice would be appreciated on how to effectively document this work on Phabricator without revealing too much to the malicious actors.

Security members are a fairly cross functional and trusted group of folks, would it be possible to limit these tasks (as possible countermeasure explorations) to these folks in Phab for early stages? I suppose it depends on who is involved in early collaboration and if they are already in members of the project but it would seem they should be to get a handle on the problem space.

Edit: we could indeed create a space that is anti-harrassment team and Security fairly easily

The real heartburn here will be with the expectation of a closed source solution being possible without undermining Right to Fork, which I cannot think of how that would be possible.

There is a large amount of related discussion at https://office.wikimedia.org/wiki/Engagement_metrics (sorry, not general community accessible) from the 2016 project that lead to the introduction of the WMF-Last-Access cookie to assist in computing unique device views.

@Aklapper @Bawolff @Legoktm @chasemp @bd808 — Thank you for the context, background info, suggestions, and concerns. Information and points-of-view like this are extremely helpful to us as we prepare our proposal for the Board. I share most of the same concerns with you all about this — effectiveness vs. risk, shelf life, privacy issues, etc. — and our proposal will include an honest and realistic risk assessment.

This Google doc (accessible to all WMF staff) has all our project notes to date (again, we just started) and I've added a summary of your comments. I will publish an update on our cross-department program's Meta page about blocking tools.

I'm not trying to ruffle feathers with this project — it's on our public wiki goals and this Phab task is a nod that we're spending time and therefore donor dollars on this. We can report our findings here on this task, but anything findings that are sensitive should cause handled as such.

Leaving here comments that I also sent via e-mail cause although others have made similar point on this ticket I feel they are worth reiterating.

I am appalled that we are considering bending the privacy policy to allow for finger printing devices, no matter our end objective. Fingerprinting will nullify any guarantee of privacy that we give our users and without a guarantee of privacy there cannot be any consumption of free knowledge . It is a practice forbidden by companies such as apple (apps that exercise iphone fingerprinting are kicked out of the apple store and uber’s deals on this regard are well known [1]) It is worth pointing out that we have about 1 billion devices visiting wikimedia projects every month [2], the fact that we are even considering tracking for 1 billion devices to flag a few bad actors is certainly not aligned with our privacy values.

If I got the core issue right, we have users evading blocks. Fingerprinting is technically a good way to track users but a faulty way to address bad actors (I can elaborate into that) but more so, in our ecosystem it seems also morally questionable.

[1] https://www.nytimes.com/2017/04/23/technology/travis-kalanick-pushes-uber-and-himself-to-the-precipice.html?_r=0
[2] https://stats.wikimedia.org/v2/#/all-wikipedia-projects/reading/unique-devices/normal|line|2-Year|~total

TBolliger renamed this task from Investigate third-party libraries for browser fingerprinting for long-term-abuse project to Timeboxed investigation into browser fingerprinting for anti-abuse report to WMF Board.Jan 14 2019, 7:14 PM
TBolliger updated the task description. (Show Details)
TBolliger set the point value for this task to 2.

If the Wikipedia mobile app were to do this, it would be de-listed from the Apple Store. Is that really something we're willing to take as collateral damage? There are serious privacy implications to this, and while I appreciate that the WMF Board has asked for this and this isn't staff's fault, the Board should be taken to task for failing to understand privacy at even a basic level.

If the Wikipedia mobile app were to do this, it would be de-listed from the Apple Store. Is that really something we're willing to take as collateral damage? There are serious privacy implications to this, and while I appreciate that the WMF Board has asked for this and this isn't staff's fault, the Board should be taken to task for failing to understand privacy at even a basic level.

The problem we're trying to address is long term abuse. Wikimedia volunteers are targeted with harassment and threats at a scale and severity on another level. Ignoring this problem is a non-starter. Privacy implications are a constraint for this project, not a driving factor. The board hasn't asked for any specific technical solutions short of "improved/device blocking."

If the Wikipedia mobile app were to do this, it would be de-listed from the Apple Store.

I do not believe that is accurate:
https://developer.apple.com/documentation/uikit/uidevice/1620059-identifierforvendor

@dbarratt Apple disallows fingerprinting that persists when an app is deleted. I suppose it depends on the implementation, but that fingerprinting would reset every time the Wikimedia app was deleted or the phone was reset. In other words, it can be removed by the user, just like a cookie, though perhaps not as easily.

@TBolliger I'm on the English Wikipedia's Arbitration Committee. I'm well aware of long-term abuse, harassment, and threats. I've received them myself, up to and including death threats. Still, sacrificing privacy of the millions of people who access Wikipedia to block the access of a handful of severely abusive people is a poor sacrifice that is entirely out of line with project values. It also simply won't work. If an editor knows how to evade CheckUser, then they're smart enough to find this Phab ticket and then Panopticlick. This is especially true since the GDPR likely would require you to advertise that you're fingerprinting. See: https://www.eff.org/deeplinks/2018/06/gdpr-and-browser-fingerprinting-how-it-changes-game-sneakiest-web-trackers

@dbarratt I think you need to do due diligence when replying. From the docs you sent "the value [UUID] changes when the user deletes all of that vendor’s apps from the device and subsequently reinstalls one or more of them". A UUID identifies an install of the app, that value is equivalent to the appinstallId already present on Wikipedia's app. There is a difference with app install Id and a fingerprint.

@dbarratt I think you need to do due diligence when replying. From the docs you sent "the value [UUID] changes when the user deletes all of that vendor’s apps from the device and subsequently reinstalls one or more of them". A UUID identifies an install of the app, that value is equivalent to the appinstallId already present on Wikipedia's app. There is a difference with app install Id and a fingerprint.

As far as I know, there is no way to uniquely identify a computer (via the web) or even a browser that cannot be changed by the user.

I think you need to do due diligence when replying.

This seems overly judgmental in the context of a technical conversation. If you have a different interpretation of the content in the link or the comment, of course, that's what the discussion is about. However, implying that the original poster simply didn't read it borders on insulting.

TBolliger set Due Date to Jan 19 2019, 7:00 AM.
Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptJan 15 2019, 11:02 PM

The problem we're trying to address is long term abuse. Wikimedia volunteers are targeted with harassment and threats at a scale and severity on another level. Ignoring this problem is a non-starter. Privacy implications are a constraint for this project, not a driving factor.

Agreed w/r to ignoring the problem is a non-starter. But if the goal is reduce volunteer-hours dealing with LTAs/etc., it seems like jumping to browser fingerprinting/device blocking should be one of the last steps. The last time I seriously looked at the problem (more focusing on the area of "steward tools", which I think is a very large overlap with AH) I came up with the list at https://www.mediawiki.org/wiki/User:Legoktm/Wikimania_2016_steward_tools - addressing those seem like low hanging fruit and IMO should be looked into before even bothering investigating browser fingerprinting, which should be treated as a measure of last resort. I would expect something like Phalanx to be way more useful in stopping LTAs than browser fingerprinting anways.

The board hasn't asked for any specific technical solutions short of "improved/device blocking."

Just to be clear, is the suggestion of browser/device fingerprinting coming from the AH team or the WMF Board?

Thank you for the comment and link, @Legoktm. "Device blocking" comes from the WMF Board, the tactic of browser fingerprinting comes from my research to date.

Based on discussions with WMF teams and external experts, reading external research, and even this Phabricator thread I've decided to pivot the "device blocking" research project into a larger "Long Term Abuse mitigation" research project. Improvements to Steward tools, AbuseFilter, and/or implementing something like Phalanx all feel like pieces of a larger LTA mitigation approach.

Given that all types of blocks can be defeated given sufficient technical expertise and determination, tools like the AbuseFilter are indeed more likely to mitigate severe LTA cases. I would also encourage the task force addressing this to consider devoting substantial resources to legal means of mitigating LTA cases. Legal avenues seem far more likely to work than technical ones, especially where we know the identity of the LTA.

This Google doc (accessible to all WMF staff) has all our project notes to date (again, we just started) and I've added a summary of your comments. I will publish an update on our cross-department program's Meta page about blocking tools.

I skimmed the doc. While there is quite a bit I'm unhappy about on this topic, one thing that jumps out at me from the doc is that there's some references, to "Salt[ing] and Hash[ing] PII", implying that hashing can reduce the privacy risk in this area - I'd like to very explicitly state, that this is not a solution to the privacy issues. Ignoring the risk of bruteforcing the preimage (depends on what you are hashing and how much entropy it has) - in this case, the hash would actually be the PII, by literal definition (What good would a fingerprint be if it couldn't personally identify someone). So hashing really does not provide anything here from a privacy perspective.

I also think there is a mismatch in the google doc between what people think these methods provide and what they actually do, albeit there wasn't any serious analysis of these techniques, and different methods provide different tracking properties.

IF this prioritized by the board, whoever is tasked with executing this project will need a way to test their tactic before investing in building it. This might even be the next step of the research for the board: to create some prototypes on specified use cases.

The document is notes for an in-progress project. The deadline for completing this and publishing both on-wiki and in an abridged version for the board is Feb 28.

I've done some cursory research on this topic.

My recommendation would be to avoid browser or device fingerprinting of any kind. It’s a privacy disaster and runs counter to our core principles. It’s easily circumvented in most cases. It has potential for false positives that would be difficult to troubleshoot for users. While it does provide the ability to single out bad actors that cause long term harassment and harm within our community, that benefit is not greater than the costs mentioned above.