[[:w:en:User:Firefly/checkuseragenthelper.js]] sends CU user-agents to a third party
Closed, ResolvedPublicSecurity
Actions

Description

I happened to stumble upon https://en.wikipedia.org/wiki/User:Firefly/checkuseragenthelper.js today - it's a script that uses an API provided on https://whatsmyua.info/ (a third-party site) to format user-agents stored in CheckUser data.

I don't think this sends connections between CU usernames and the UA string though - referer header seems to only send the domain name and not the full path with possible other private information but it's still sending CU UAs (which are mentioned in the privacy policy as being private information) to an untrusted third party.

Details

Risk Rating: Low
Author Affiliation: Wikimedia Communities

Related Objects

Mentioned In: T293811: Clarify whether CUs should share non-public information with external services
Mentioned Here: T175587: Add a user-agent parser to CheckUser

Event Timeline

taavi created this task.Oct 14 2021, 2:35 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 14 2021, 2:35 PM

Reedy added a project: Privacy.Oct 14 2021, 3:00 PM

I think there are a few issues at play here:

Users of the User:Firefly/checkuseragenthelper.js user script (of which there are apparently only 4 on enwiki at the moment) perhaps unknowingly sending some of their own PII (IP, UA, etc.) to whatsmyua.info via an xmlhttprequest from their browser. Without explicitly modifying header data for the xmlhttprequest, I believe this is what would happen since it's just another GET request.
Isolated UA strings from CU data being sent to whatsmyua.info, which may violate the standard Wikimedia Privacy Policy, at least via a letter-of-the-law reading. There's been a healthy amount of debate over this and whether most UA information, especially in isolation, is a meaningful privacy leak. For many, I would personally say it's not, since most people use standard, fairly popular web browsers with common UAs. But there are likely edge cases where, a UA by itself, might be able to significantly narrow down a geographic region, group of ISPs, group of buildings, etc. While unlikely, I suppose that's theoretically possible.
I'm not sure how CUs, who are privileged users on-wiki but not necessarily NDA'd (to the best of my knowledge) are governed by the Wikimedia Privacy Policy. AIUI, the PP mostly speaks to what the Foundation will or will not do with user data, not various users of the projects. I don't think anyone wants privileged wiki users taking potentially sensitive data and doing anything they please with it, but I'm also not entirely sure the PP, as written, legally governs that activity. Ok, this should not matter as they are governed by the policies @Legoktm describes below, and confirmed by @sguebo_WMF that Trust-and-Safety actively approves/monitors advanced rights across all wikis.

This is basically a workaround for T175587: Add a user-agent parser to CheckUser.

In T293379#7428997, @sbassett wrote:

I'm not sure how CUs, who are privileged users on-wiki but not necessarily NDA'd (to the best of my knowledge) are governed by the Wikimedia Privacy Policy. AIUI, the PP mostly speaks to what the Foundation will or will not do with user data, not various users of the projects. I don't think anyone wants privileged wiki users taking potentially sensitive data and doing anything they please with it, but I'm also not entirely sure the PP, as written, legally governs that activity.

They're governed by https://meta.wikimedia.org/wiki/Access_to_nonpublic_personal_data_policy and https://meta.wikimedia.org/wiki/Confidentiality_agreement_for_nonpublic_information

Because the IP is being sent as a query parameter, it's most likely going to be ending up in server-side logs of whatsmyua.info. The site itself is open source https://github.com/sandinmyjoints/whatsmyua and could easily be self-hosted but really it should be built-into CheckUser.

In T293379#7429363, @Legoktm wrote:

Because the IP is being sent as a query parameter...

User-agent, no?

In T293379#7429745, @sbassett wrote:

In T293379#7429363, @Legoktm wrote:

Because the IP is being sent as a query parameter...

User-agent, no?

Yep, sorry, typed the wrong thing.

For client side use we have $.client. This is used by various production, features including VisualEditor, TMH, UploadWizard, and others.

https://www.mediawiki.org/wiki/ResourceLoader/Core_modules#jquery.client
https://doc.wikimedia.org/jquery-client/master/jQuery.client.html

$.client.profile( str );

I purged the script file with a reference to this task as a preventive measure.

Also tagging T&S, who "governs" checkusers on behalf of the Foundation.

Adding GeneralNotability per request.

Same thing was happening at https://en.wikipedia.org/w/index.php?title=User:GeneralNotability/InvestorGoat.js, blanked that script as well.

I spoke briefly with Martin on IRC about this to voice my concerns, but I will repeat them here: while I recognize that I am a newly-minted checkuser and am not the one making or interpreting the rules, I cannot see this as any different from how checkusers use routinely use WHOIS, geolocation tools, proxy detectors, and the like when investigating IP addresses (and I note that IP addresses are mentioned in the same section of the privacy policy as useragents). Yes, it is sending the UA to an external service, and it obviously ties the request to my IP, but as far as I know there is nothing coming out of MediaWiki that would also leak my username to the external website and so it would be a non-trivial task to match up checkuser actions with queries to this site.

I do recognize that _automatically_ sending UA information might not be a good plan, and requiring the CU to push a button or something to make the queries would not be unreasonable in my view. Also, Firefly's script does expose the counts unnecessarily (by querying once for each .mw-checkuser-agent rather than once per UA), but that could trivially be fixed by locally caching each UA's response in a Map or something.

Alternatively: would it be acceptable to just run our own copy of the service on toolforge, managed by me (someone who has signed all of the ANPDP stuff)? The long and short of it is that T175587 would be very nice, but it's been sitting around for two years, and Firefly and I both have bashed out useful stopgaps in a matter of days that I would really like to keep around in some form until that ticket is resolved. I see jquery.client mentioned above, but am not convinced it has the amount of detail that I'd find useful.

Urbanecm added a subscriber: TheresNoTime.Oct 16 2021, 3:55 PM

Tks4Fish subscribed.Oct 16 2021, 4:30 PM

After talking with Martin, I've redeployed my script without the external call (directly using one of the libraries the external service used). There is still a big question to be answered here, though, since feeding IPs (mentioned in the same line as UA in "personal information") into external services (for proxy checks, geolocation, even just WHOIS to get their range) is a routine part of the checkuser process and we do not have on-wiki tools that can provide those services. I'm not saying that as in "that's how I personally do it" - I've talked with checkusers before about their workflow and they have frequently mentioned those as part of their process. Again, my opinion is that _as long as those data points are not tied to a named account_ there is no privacy concern here, and it would be impractical to use the timing to connect these lookups to specific checks.

Urbanecm added a subscriber: L235.Oct 16 2021, 5:32 PM

sguebo_WMF moved this task from Incoming to Backlog on the Privacy Engineering board.Oct 18 2021, 3:43 PM

sbassett closed this task as Resolved.Oct 18 2021, 3:56 PM

sbassett claimed this task.

sbassett triaged this task as Medium priority.

sbassett edited projects, added SecTeam-Processed; removed Security-Team.

I'm boldly reopening. I'd appreciate guidance on similar cases: are they always a privacy violation? Is manually copying the IPs into an external service (like whois) better for some reason, and why? See @GeneralNotability comments for more questions that should likely be answered here.

@Urbanecm - Sorry about the confusion. We're going to resolve this task for now since the immediate issues described within the task were dealt with. Regarding potential policy and/or legal issues with performing investigations with private Wikimedia data using completely external services - @sguebo_WMF and @Htriedman have put this in their Privacy Engineering backlog to bring up with WMF-Legal for further clarification.

Restricted Application added a project: User-Urbanecm. · View Herald TranscriptOct 18 2021, 4:14 PM

I won't argue the close, but I ask that security makes answering the broader question of "does this violate the privacy policy" this a high priority. It is a routine and ongoing practice for checkusers to consult external services for information on IPs, which are also protected as private information by the privacy policy and calling that unacceptable (which this closure somewhat implies) would have significant repercussions on checkusers' ability to combat harassment.

In T293379#7437306, @GeneralNotability wrote:

I won't argue the close, but I ask that security makes answering the broader question of "does this violate the privacy policy" this a high priority. It is a routine and ongoing practice for checkusers to consult external services for information on IPs, which are also protected as private information by the privacy policy and calling that unacceptable (which this closure somewhat implies) would have significant repercussions on checkusers' ability to combat harassment.

AFAIK, this is now a priority for @sguebo_WMF and @Htriedman to review with WMF-Legal. They can speak more to the timeline and general concerns they plan to address.

sguebo_WMF mentioned this in T293811: Clarify whether CUs should share non-public information with external services.Oct 19 2021, 4:32 PM

We took a look and this doesn't violate the privacy policy or access to non-public info policies (although in general we'd always recommend using trustworthy services with good security). In the access to non-public info policy, roman numeral ii gives users with this sort of access permission to disclose this sort of data to
"service providers, carriers, or other third party vendors to assist in the targeting of IP blocks or the formulation of a complaint to relevant Internet Service Providers" which is what allows this here. The overall privacy policy also allows this kind of disclosure to protect people (section titled: To Protect You, Ourselves & Others) and includes users with advanced rights in that allowance.

Thank you @Jrogers-WMF, the prompt response is much appreciated!

I don't believe there is any security concern at this point, can we unprotect this ticket?

In T293379#7463286, @Jrogers-WMF wrote:

We took a look and this doesn't violate the privacy policy or access to non-public info policies (although in general we'd always recommend using trustworthy services with good security). In the access to non-public info policy, roman numeral ii gives users with this sort of access permission to disclose this sort of data to
"service providers, carriers, or other third party vendors to assist in the targeting of IP blocks or the formulation of a complaint to relevant Internet Service Providers" which is what allows this here. The overall privacy policy also allows this kind of disclosure to protect people (section titled: To Protect You, Ourselves & Others) and includes users with advanced rights in that allowance.

While that is true, I'd like to point everyone's attention to what's said below (ii): Please note, however, if a Designated Community Member chooses to disclose in a situation covered by (ii), or (iv), or if they are required by law to disclose to law enforcement, administrative bodies, or other governmental agencies, they must secure written approval from the Wikimedia Foundation. Does that mean CUs need to ask prior approval before using whois on the IP addresses? And if so, would it be an one-off approval (of a particular service), or requesting approval for each individual release (which would make whoising impossible)?

In T293379#7463620, @GeneralNotability wrote:

[...]
I don't believe there is any security concern at this point, can we unprotect this ticket?

No objections by me, but I'll leave that to secteam (cc @Dsharpe).

In T293379#7464522, @Urbanecm wrote:

In T293379#7463620, @GeneralNotability wrote:

[...]
I don't believe there is any security concern at this point, can we unprotect this ticket?

No objections by me, but I'll leave that to secteam (cc @Dsharpe).

Adding Security-Team to supply a response during our weekly clinic.

Boldly reopening the task, as there was likely few weekly clinics since Oct 28, and response was not received :).

Hey @Urbanecm - Privacy Engineering should follow up with you on your follow-up question soon, thanks.

@Urbanecm Your question is, I think, really a WMF-Legal question. I'll reach out to them as they don't routinely monitor Phab.

In T293379#7534455, @JFishback_WMF wrote:

@Urbanecm Your question is, I think, really a WMF-Legal question. I'll reach out to them as they don't routinely monitor Phab.

Thanks. Let me know what they say please.

@Urbanecm Thanks for following up on the approval part of the policy, my apologies for not addressing it. I don't recall users running a normal whois search ever coming up, but it has also been part of regular practice for some time and is quite low risk, so I think we can give a general approval for the use of regular IP searches. For something bigger like this scripted tool, I had already taken a look at this one and it's approved for use (treat this post as written approval). I think for other tools that might allow a large number of requests like this, we probably should have some kind of legal and security review first and then approve them. I'll bring this up with the legal privacy team in our meeting next week to discuss what we should look for going forward so our process doesn't take too long.

Thanks @Jrogers-WMF, this is helpful.

In that case, my reasons for blanking the two scripts (1, 2) no longer exists. Per @GeneralNotability's advice, I've reverted my blanking of the script.

Security-Team: Could you please advise whether the task can be published now?