Page MenuHomePhabricator

Checkuser should include in its logs external authentications via OAuth
Open, Needs TriagePublicFeature

Description

There's been some discussions in various forums recently about logging OAuth interactions. This was originally brought up with respect to WikiAuthBot (which is used by various wikipedia-adjacent Discord servers to authenticate users against their WMF credentials using OAuth), but I think we should address the more general OAuth question. WikiAuthBot does some logging of its own, which is visible on the Discord side, but it would be useful for the OAuth code on the server side to also log both successful and unsuccessful authentications.

There's a lot of issues related to this that need to be sorted out. Some of these might be worth breaking out into their own tasks, but I'll just dump them all here to get things started.

Its not clear to me what privacy policies even apply here. If the OAuth client is running in the WMF cloud environment, then Wikitech:Cloud Services Terms of use certainly applies.

If the client is running in some other environment, I'm inclined to think the global privacy policy still has some sway, since we're dealing with WMF accounts. As external services such as Discord become more widely used in close proximity to WMF projects (sometimes called mash-ups), it becomes less and less clear to the end user (especially a technologically unsophisticated one) what entity is running what service. It would be unfortunate if a user believed they were using a WMF service and thus entitled to the protections afforded by the various WMF privacy-related policies, only to discover that more logging was going on then they thought. So regardless of what we end up doing on the CU/OAuth logging front, updating our policies and guidance to third-party developers would be useful.

On the technical side, it's not even clear what data is available to the OAuth extension. CU gets its IP/UA/CH data from the incoming HTTP headers. For an OAuth request, those fields will probably reflect the OAuth client software rather than the end user. Possibly what we want to do is encourage client developers to pass along the original end user data in XFF headers or some similar mechanism?

Event Timeline

On the technical side, it's not even clear what data is available to the OAuth extension. CU gets its IP/UA/CH data from the incoming HTTP headers. For an OAuth request, those fields will probably reflect the OAuth client software rather than the end user. Possibly what we want to do is encourage client developers to pass along the original end user data in XFF headers or some similar mechanism?

I'm not sure I fully understand this concern, so let me try to rephrase (please correct me if I'm wrong) and ask a few questions.

  • The Wikimedia Foundation User-Agent Policy encourages tools to set a helpful and identifying UA string. That helps identify the tool and the user operating it. I’m not sure whether a log like User X connected from IP/UA Y to tool W exists. Could LWCU shows it?
  • For most uses of OAuth-logged tools, CU will see the WMF-allocated IP of the OAuth client (e.g., toolforge), but not the real IP adress from the user — unless the tool is run locally by the user, which seems uncommon, if I'm not wrong. Could it be given as XFF header? In general XFF are untrustworthy - since spoofed - but I'd say WMF allocations are 'trusted proxies' and tools are meant to be 'trusted'.

"External authentication" is not a well-defined concept for OAuth. OAuth has an authorization step (when the user visits Special:OAuth/authorize in a browser and answers "Yes" to a request for permissions for the app, and the app receives an access token which grants it the ability to act on the user's behalf) and an identity endpoint where the app can get information about the user who authorized the access token.

The authorization dialog is accessed directly by the user, using a non-OAuth authentication method, and could easily be logged to CU. Not sure about the privacy implications but it seems comparable to other things that get logged. The app doesn't learn about the identity of the user during authorization though, so it's not really authentication.

Requests to the identity endpoint are made by the app so recording checkuser information isn't really helpful there.

Possibly what we want to do is encourage client developers to pass along the original end user data in XFF headers or some similar mechanism?

That's T159889: Per-consumer XFF trust settings and then in the case of Toolforge / Cloud VPS the tool also needs access to the IP (according to T135046: Allowlist Cloud VPS instances that need XFF header passed through the web proxy Cloud VPS has an allowlist, not sure if there's something similar for Toolforge) or there needs to be some mechanism for injecting it somehow.