Page MenuHomePhabricator

Change tags disclose non-public information about editors
Open, Needs TriagePublic

Description

Change tags are leaking non-public information about editors, such as information about their user agent which is not otherwise made public.

Reported on enwiki in: https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Tags&oldid=920405352#Editor_privacy

Event Timeline

Xaosflux created this task.Oct 9 2019, 3:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 9 2019, 3:52 PM

From the privacy policy (https://foundation.wikimedia.org/wiki/Privacy_policy) "we consider at least the following to be “personal information” if it is otherwise nonpublic and can be used to identify you: ... user-agent information"

The tags do not divulge full literal user-agent strings, but they do appear to include partial information that a casual user may assume is their user agent.

Pine added a subscriber: Pine.Oct 9 2019, 7:00 PM

I don't like having tags be public which can be used to strongly infer anything about a user's hardware. I support the removal of those tags from public view.

Bawolff added a subscriber: Bawolff.EditedOct 9 2019, 7:11 PM

*edit* i was reading the wrong section of the talk page. Please ignore this

To clarify: you are saying it keaks which oauth client you are using, which is basically your user agent? Or are you saying the user-agent header is being leaked, or something else?

If the former, Just my personal opinion [the only type of opinion i have now that im just a volunteer], but there are strong antiabuse reasons to allow which oauth client you are using to be public. Im not sure where the privacy policy stands on this currently, but I think the privacy policy should be changed to clearly allow this usecase.

Restricted Application added a project: Wikipedia-iOS-App-Backlog. · View Herald TranscriptOct 9 2019, 7:17 PM
Xeno added a comment.Oct 9 2019, 7:32 PM

So to ne clear: this is about the iOS app edit and mobile edit tag?

That is the concern, though those were examples and may not be a complete set of potentially problematic tags.

Pine added a comment.Oct 9 2019, 7:38 PM

Including tags on ENWP and Commons, there are tags for "ios app edit", "mobile app edit", "android app edit", "mobile web edit", and "advanced mobile web edit". I would remove all of those from public visibility.

MusikAnimal added a subscriber: MusikAnimal.EditedOct 9 2019, 8:01 PM

I am guessing the user agent is not actually being read by the software (at least when it determines what tags to apply). It's just recording the method used to edit. For example I can edit on desktop using the mobile UI, and vice versa, and the tags apply to the interface I used, not the device. In this sense it is no different than the tag added when using HotCat or Huggle (both of which suggest you're using desktop), or even AutoWikiBrowser which narrows it down to just Windows. Also there are user scripts that add "(using Foobar script)" to the edit summary, which indicate desktop. Then consider the Pageviews-API; If there is one pageview and one edit to [[Foo bar]] and it is mobile web, I know the single editor is using mobile web.

The mobile app tags are more defining, but I think this form of tracking is still mostly harmless and has applications beyond debugging (e.g. you know your vandal always uses the iOS app). User agents can be very specific, and what is being revealed via tags is high-level and wouldn't by itself help identify the human behind the computer.

Pine added a comment.Oct 9 2019, 8:17 PM

A user's hardware can be strongly inferred by the mobile tags. Mobile tags or the lack of them can potentially be revealing about a user's location and the range of hardware which the user can access. I don't think that there's a good reason for any of the mobile tags to be public. I'm OK with allowing checkusers to have access to this type of information which could be useful in identifying vandals or sockpuppets, but not the general public.

Krenair added a subscriber: Krenair.Oct 9 2019, 8:21 PM

I don't quite see how mobile/desktop could infer a user's location, but regardless, the applications for tags in general are ample; the prime example being counter-vandalism. https://en.wikipedia.org/wiki/Special:AbuseFilter/819 wouldn't have worked without the user_mobile tag (or at least there would have been a lot of false positives). XTools Auto Edits helps reveal content contributions, which wouldn't be possible without these sort of identifiers. The VisualEditor (desktop) tag helped the community identify a bug. Admins can also identify unauthorized automated editing, etc. As I said they are not definitive, anyway; I can edit on desktop through the mobile UI, and I can edit using the Android app through a desktop simulator, etc. I of course cannot speak from a legal standpoint but it appears the privacy policy is about user agents, and I don't think that's being exposed.

If you really wanted to keep this private, you'd need to do away with tags altogether since any developer can make use of them for their device-dependent tool. You'd also need to carefully vet every user script and gadget (which can be default-on for all users).

I say all of this only as a frequent consumer of this metadata. Perhaps we could simply hide tags from revision histories (as with .mw-tag-markers { display: none }), but retain the data behind-the-scenes? That'd make it semi-private, while permitting the many benefits they have to counter-vandalism, data aggregation tools, and harmless usage tracking. Interested users and patrollers could force it to show via their user CSS, as with .mw-tag-markers { display: inline !important }.

@MusikAnimal I don't think the potential privacy concern follows to the use cases where an editor uses a custom editing tool to make their edits - in those cases the editor is purposefully asserting the tag with their edits.

@MusikAnimal I don't think the potential privacy concern follows to the use cases where an editor uses a custom editing tool to make their edits - in those cases the editor is purposefully asserting the tag with their edits.

Isnt the iOS app essentially just a custom edit tool, albeit one that is "official"

Perhaps we could simply hide tags from revision histories (as with .mw-tag-markers { display: none }), but retain the data behind-the-scenes? That'd make it semi-private

Im not a fan of this idea. If there was malicious usage of this data (an idea i find a stretch tbh) hiding it would probably mean malicious people would still find it, but normal users would be less aware and unable to make concious informed choices about their privacy. I think the most important thing is ensuring any users bothered by this are aware of it, so they can make their own informed choices whether its worth making edits with the apps that use these tags.

phuedx moved this task from Needs triage to Triaged on the Mobile board.Oct 10 2019, 12:43 PM

Perhaps we could simply hide tags from revision histories (as with .mw-tag-markers { display: none }), but retain the data behind-the-scenes? That'd make it semi-private

Im not a fan of this idea. If there was malicious usage of this data (an idea i find a stretch tbh) hiding it would probably mean malicious people would still find it, but normal users would be less aware and unable to make concious informed choices about their privacy. I think the most important thing is ensuring any users bothered by this are aware of it, so they can make their own informed choices whether its worth making edits with the apps that use these tags.

Information hidden with CSS is still as accessible as before – you just cannot trivially see it.

The reason user agents are PII are because they act like a fingerprint for cross-tabulation with other sources of the same data. For ultra-small populations (like Solid users – hi, TimBL – or BeOS users, or whatever) knowing the user agent / platform might move slightly towards individual privacy concerns, but this is pretty trivial.

My full UA string right now is Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36, shared with millions of other people around the world – in the last 24 hours, there were ~4.8m Wikimedia pageviews with that string. Obviously less common browsers exist (on my device in regular, no less), but this is not a serious privacy concern.

Pine added a comment.Oct 10 2019, 10:48 PM

@MusikAnimal I don't think the potential privacy concern follows to the use cases where an editor uses a custom editing tool to make their edits - in those cases the editor is purposefully asserting the tag with their edits.

Agreed. My objection is to making public whether or not the user is editing with a mobile interface. I know that users can intentionally change from mobile to desktop and vice versa, but this level of effort shouldn't be required to hide what type of device someone is using, and my guess is that many thousands of users don't know that this is possible or how to do it.

I respect the concerns of those raising this issue, but mobile devices (and the two dominant OSs that run them) are so prevalent that I don't think it says too much about a person to reveal that they are using the mobile site or the app.

Though let's not go down the path of "you're a dirty stinking VE user" or "just what I'd expect from someone who uses Huggle"! 😛

Perhaps for symmetry we should also tag desktop and wikitext edits, so that were not calling out the others as unusual. Compare the cultural expectation revealed by phrases such as "male nurse" or "female doctor".

Perhaps for symmetry we should also tag desktop and wikitext edits, so that were not calling out the others as unusual.

We are (or at least, were) not proceeding with that due to scaling concerns. See T188433.

Pine added a comment.Oct 15 2019, 7:14 PM

I can't access the edit filter that MusikAnimal mentioned and I have little knowledge about edit filters but I would guess that users who have been granted access to private data could set up edit filters that take into account whether a user is running on a mobile interface.

I'm not aware of any strong reason that the mobile tags or any data that indicates whether a user is using a mobile interface should be public, and in the absence of a strong reason for this information to be public I think that the information should be private.

I'm not aware of any strong reason that the mobile tags or any data that indicates whether a user is using a mobile interface should be public,

They're extensively used for debugging issues people encounter in production.

and in the absence of a strong reason for this information to be public I think that the information should be private.

No. That's the opposite of how things work at Wikimedia.

JFishback_WMF moved this task from Intake to Backlog on the Privacy board.