Page MenuHomePhabricator

Security review for ApiFeatureUsage extension
Closed, ResolvedPublic

Description

Security review for eventual deployment to the cluster. May also need to review Gerrit change 173336.

Event Timeline

Anomie raised the priority of this task from to Needs Triage.
Anomie updated the task description. (Show Details)
Anomie changed Security from none to None.
Anomie subscribed.
greg triaged this task as Medium priority.

Implementation looks safe.

The feature worries me a little in how it could allow tracking a user's activities. When I search for a prefix of my user agent that specified my version of Chrome on beta, I basically get a list of my api activity (apparently I'm the only person using Chrome 40.0.2214.93). Combined with the ability to get minute resolution on searches, that means someone with some time to brute force could trace to the minute whenever my user agent has used the api, which they could correlate to recent changes to get my wiki identity.

If you're even more patient, and you want to go wiki identity -> user agent, this also gives you an oracle to brute force against-- if you know User:Victim made an edit at 12:01, you can start searching for for longer and longer afuagent strings limited to that minute in time, to brute force their user agent. Naive brute force would only take on average 31 guesses per character in the user agent string.

There's obviously a huge issue with browsers sending specific user agents in general, but I'm wondering if we can do anything on our end to mitigate it for the users who don't worry about changing their user agent.

Could limit searching resolution to a relatively large time window, like a day (i.e., force 00:00:00 for the hour/min/seconds)? Or could we limit the user-agent string to 20 or 30 characters? And not return any result when the number of uses is below a threshold (5 or so)?

Note that not all access are even logged; more than knowing "User:Victim made an edit", you'd have to know "User:Victim made an edit via the API that involved hitting some code path that called ApiBase::logFeatureUsage()".

Forcing resolution of a day wouldn't be a problem, I probably should have done that in the first place.

Limiting the length of the string or not returning any results would probably interfere with the intended uses of the feature.

gerritbot subscribed.

Change 188839 had a related patch set uploaded (by Anomie):
Limit query granularity

https://gerrit.wikimedia.org/r/188839

Patch-For-Review

Note that not all access are even logged; more than knowing "User:Victim made an edit", you'd have to know "User:Victim made an edit via the API that involved hitting some code path that called ApiBase::logFeatureUsage()".

Yeah, the rare ones are probably not done via api in general. Uploading worries me a little-- there might be small wikis that use upload wizard (which iirc uses the api), and only one person uploading per day. But that's probably not a huge risk.

Are we rolling this out to all wikis? Or just big ones?

The extension doesn't actually allow you to ask for hits to a particular wiki, the logged events from all wikis are lumped together.

It wouldn't be too hard to add that ability, but the intended uses don't need it.

The extension doesn't actually allow you to ask for hits to a particular wiki, the logged events from all wikis are lumped together.

It wouldn't be too hard to add that ability, but the intended uses don't need it.

Oh, cool. Let's not, it would make identifying users easier.

As soon as gerrit 188839 is merged, we can close this then.

Change 188839 merged by jenkins-bot:
Limit query granularity

https://gerrit.wikimedia.org/r/188839