Page MenuHomePhabricator

Allow filter rules to consider private data such as source IP, reverse DNS and user agent.
Closed, DeclinedPublic

Description

Author: FT2.wiki

Description:
Is there a way that functions such as "reversedns(user_ip) LIKE X" can ever be included in AbuseFilter?

The reason Im thinking this is, there are a number of major IPs where this might finally provide a means to prevent vandals that at present can't easily be. For example, a number of ISPs have large IP ranges and dynamic Ips, so a block is futile, the user just resets the router for a new IP. But whatever IP is given will resolve to (say) "adsl-*.region17.isp.net" and hence a pattern match on the reverse dns would allow edits by users in that specific area to be picked up, whereas at present no block or other automated system can spot or deal with these kinds of vandals.

Two caveats: 1/ is the cost of reverse DNS lookup prohibitive (and if so can it be cached locally to reduce that); 2/ does this introduce a class of AbuseFilter functions such as user_ip, that would mean the function is not displayed or able to be edited except by checkusers?

I think this is useful enough to explore further.


Version: unspecified
Severity: enhancement

Details

Reference
bz18429

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:34 PM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz18429.

FT2.wiki wrote:

s/function/filter

matthew.britton wrote:

(In reply to comment #0)
2/ does this introduce a class of

AbuseFilter functions such as user_ip, that would mean the function is not
displayed or able to be edited except by checkusers?

Yeah... nice idea, but if this was implemented, it could be abused to pin down users' IP addresses/ranges. And I can think of several en.wikipedia administrators who *would* do that.

FT2.wiki wrote:

It would be trivial to ensure that a filter function that used the "user_ip" variable could not be created, nor its logs/history read, by any except checkusers. That probably takes care of that one. There may be other ways to handle that point as well, but that seems the easiest. I don't see that as a major problem, just one needing careful concept thinking.

Discussed this on IRC with FT2. My general comments on the outcome of that discussion (from my perspective, FT2 may have different opinions):

1/ Adding additional hierarchy to AbuseFilter is a pain, both programmatically and socially.

2/ The fact that the abuse filter log is viewable by all users is a core principle guiding the Abuse Filter. It is critical that all filters may be assessed on their performance, if not on their construction. Smaller groups/cabals of checkusers, oversighters and what-not may have good intentions, but without the accountability of having the impact of filters assessed by the wider community. Smaller "cabals" encourage groupthink, and create an environment which may ease carelessness or outright negligence in filter construction.

3/ It would be technically trivial to hide variables containing private data from the abuse filter log, in order to allow them to be sent to filters.

4/ There are concerns (as expressed by Gurch) that the abuse filter log for filters using private data could allow users not identified to the Foundation to guess private information, or at least part of it (for instance, that a particular user edits from a particular IP range). The privacy policy permits disclosure of private data for the purposes of preventing and monitoring abuse of editing privileges, and covers only personally identifiable information. Residing on a particular range is not by itself personally identifiable information, although it may be private information; and while the user-agent header sent by a user is not public data, I would not really classify it as "private", per-se, and certainly not personally identifiable. Accordingly, I believe the benefits of hiding log entries for rules considering private data are outweighed by the detrimental effect on filter use transparency (see point 2).

matthew.britton wrote:

(In reply to comment #4)

Residing on a particular range is not by itself personally identifiable
information, although it may be private information; and while the user-agent
header sent by a user is not public data, I would not really classify it as
"private", per-se, and certainly not personally identifiable.

You're right, it wouldn't (at least in most cases) count as such.

Though it could be used to determine, say, where a user is from. While I personally don't care who knows that, I know there are a lot of people out there who do -- imagine a "Contributors from XYZ" filter with IP ranges that geolocate to that place, in a private filter looking for (or claiming to look for) a particular abusive user from that area. Now any legitimate user editing from XYZ gets an entry in the abuse log linking their username to place XYZ, and that log entry is visible to everyone, not just admins. It's not exactly Checkuser but it's more disclosure than there currently is (I lack the patience and legal expertise to figure out exactly what the privacy policy's take is on this :)

Not sure what user-agent header has to do with anything, that (usually) only identifies the user's browser and OS. Though I am aware checkusers also have access to that information, I don't know what they do with it nor why anyone would want to use it for an abuse filter.

FT2.wiki wrote:

"Now any legitimate user editing from XYZ gets an entry in the abuse log linking their username to place XYZ, and that log entry is visible to everyone, not just admins."

Incorrect, or else, overlooked the comment on this.

See original suggestion: "It would be trivial to ensure that a filter function that used the 'user_ip' variable could not be created, nor its logs/history read, by any except checkusers."

(In reply to comment #6)

See original suggestion: "It would be trivial to ensure that a filter function
that used the 'user_ip' variable could not be created, nor its logs/history
read, by any except checkusers."

I strongly object to that suggestion. It's okay for only checkusers to be able to create filters which act on private data, but the hit logs MUST be kept public. See my previous comment for further details of my position on this.

matthew.britton wrote:

(In reply to comment #6)

Incorrect, or else, overlooked the comment on this.

Yeah, my comment was in reply to Andrew mostly. You say these filters should exist but be completely private, Andrew says if they do exist at all they have to be publicly logged in some way (unlike Checkuser), I say they shouldn't exist at all. Other than that, we're in perfect agreement. :)

happy.melon.wiki wrote:

I don't think those positions are necessarily mutually exclusive, although they are somewhat juxtaposed. We current have "private" filters: the hit log is publically-viewable with abusefilter-view, this just says "User X tripped filter Y (ShortDescriptionOfFilterSetByFilterEditors), doing something somewhere". Users with abusefilter-view-details can then see the exact parameters of the edit, *unless* the filter has been set to "private", in which case they need an additional permission (abusefilter-modify?). It would be possible to create another class of filters, either implicitly or explicitly, for which you need the abusefilter-private permission to see anything more than the basic "X tripped Y" log.

Can we form an on-wiki consensus one way or the other for this, please?

Resolving as LATER in the absence of any community consultation.