Page MenuHomePhabricator

[Epic] Ensure filters that use PII-sensitive variables are protected
Open, Needs TriagePublic

Description

T357772: Investigate: How will `ip_in_range` and `ip_in_ranges` function when temporary accounts are enabled introduces the user_ip variable, which is considered PII and therefore has legal restrictions around retention and viewing. As of writing, AbuseFilter allows filters to be made private but variables and conditions have no concept of being "protected." We should introduce the concept of a protected variable which will then have effect on filter visibility.

This draws from some of the notes and solutions in T234155 but is split off into this ticket as the concept of a CheckUser level filter is a much larger scope than necessary to implement the user_ip variable needed to not break filters when temp accounts roll out.

Proposed Solution:

Variables should be declarable as protected, via a right and the right should be declarable on an individual basis, but isn't required (eg. we might want abusefilter-view-ip and only apply that to user_ip, or we can generalize it later to something like abusefilter-view-pii and have it apply to all protected variables).

The private filter setting shouldn't be changed since it 1. can be toggled on/off, 2. relies on a stricter condition than what user_ip is expected to need, and 3. relies on the user to correctly apply it. Filters should still be able to be made private via the current mechanism and this mechanism should be implemented alongside what already exists.

This mechanism:

  • should use a new column, eg. af_protected that takes a running list of user rights which are required to view the filter/logs
    • any time a sensitive variable is used, the right required to see that variable should be added to that list
    • removing a variable shouldn't remove the right from the list
  • should only allow users to add protected variables they have the rights to use (otherwise they'd lock themselves out of their own filter)
  • should automatically be enforced on save
  • should apply similar permissions to checking edits against filters and other theoretical/testing tools

Related Objects

StatusSubtypeAssignedTask
Resolvedkostajh
StalledNone
In ProgressNiharika
OpenNone
OpenTchanders
Resolvedlbowmaker
Resolvedtstarling
OpenNone
ResolvedSTran
OpenSTran
ResolvedSTran
ResolvedSTran
Resolvedkostajh
StalledSTran
ResolvedTchanders
ResolvedSTran
OpenSTran
StalledNone
OpenNone

Event Timeline

Change #1026560 had a related patch set uploaded (by STran; author: STran):

[mediawiki/extensions/AbuseFilter@master] [WIP] Allow variables to be restricted by user right

https://gerrit.wikimedia.org/r/1026560

  • should use a new column, eg. af_protected that takes a running list of user rights which are required to view the filter/logs

Just a note that if corresponding columns are not added to abuse_filter_history and abuse_filter_log, audience checks for these two should query the af_protected column as well. Whether to add it or not also depends on how we want to show the "protected" status in the interface (Special:AbuseFilter/history and AbuseLog).

  • any time a sensitive variable is used, the right required to see that variable should be added to that list

Having a non-atomic value (list of strings) in this fields sounds a bit wrong... Even though this pattern is used elsewhere in AF, it's not ideal. I'm wondering if we could do something like having the column be a bitfield, and then define access levels in the application, mapping bit values to the corresponding rights.

  • should only allow users to add protected variables they have the rights to use (otherwise they'd lock themselves out of their own filter)

Agreed, and noting that this will require proper messaging in the UI.

  • any time a sensitive variable is used, the right required to see that variable should be added to that list

Having a non-atomic value (list of strings) in this fields sounds a bit wrong... Even though this pattern is used elsewhere in AF, it's not ideal. I'm wondering if we could do something like having the column be a bitfield, and then define access levels in the application, mapping bit values to the corresponding rights.

We may be able to simplify this. I think there are 4 levels of privacy that we need:

  • Public
  • Private (ie abusefilter-view-private)
  • Reveal IP for temp users (which could probably be abusefilter-view-private plus a preference) - the only reason this needs to be separate from Private is that legally the user needs to check a preference to accept an agreement if they haven't already signed any agreement with us
  • CheckUser-level (ie abusefilter-privatedetails) - not to be implemented here, but would be needed for T234155

This could be captured by changing af_hidden from a bool to a bitfield, and should be fairly backwards compatible if public=0 and private=1.

  • should only allow users to add protected variables they have the rights to use (otherwise they'd lock themselves out of their own filter)

Agreed, and noting that this will require proper messaging in the UI.

There's a danger here that someone who edits a filter to make it more private locks out other people who could previously see the filter (perhaps even the original author).

I guess this will be a fairly common case too, as user_ip gets added to filters.

To prevent this, we could disallow ever changing the privacy level of a filter, and obliging users to copy the filter to a new one if they want to add a private variable. But that might be awkward and would mean the filter's history gets lost, so maybe locking out former filter viewers/editors is preferable?

This could be captured by changing af_hidden from a bool to a bitfield, and should be fairly backwards compatible if public=0 and private=1.

Indeed... I'm sure I proposed this somewhere in another task, and I thought that was T234155, but I just read it again and it isn't there (T234155#5706023); maybe it was for something else, or maybe it just never left my brain. One thing to keep in mind with changing af_hidden (which would otherwise be much simpler than adding a new column) is that the proposed new flags cannot be removed, unlike the existing "private" flag. This shouldn't be too much of an issue, but still something to consider I guess.

Also a potential 5th level would be oversight-level filters (T290324), although the number of levels doesn't really affect the implementation.

  • should only allow users to add protected variables they have the rights to use (otherwise they'd lock themselves out of their own filter)

Agreed, and noting that this will require proper messaging in the UI.

There's a danger here that someone who edits a filter to make it more private locks out other people who could previously see the filter (perhaps even the original author).

Yes, and I agree that this wouldn't just be a theoretical edge case.

To prevent this, we could disallow ever changing the privacy level of a filter, and obliging users to copy the filter to a new one if they want to add a private variable. But that might be awkward and would mean the filter's history gets lost, so maybe locking out former filter viewers/editors is preferable?

Maybe... I think both approaches are somewhat awkward -- being locked out of a filter or having to create a new one. Forcing people to create a new filter might actually be the cleanest approach. Suppose we lock the privacy level and you have a scenario where someone would want to create a separate filter... Then the same person would likely still create a new filter if we locked them out instead. It might be worth reaching out to AF maintainers to get feedback on this.

This could be captured by changing af_hidden from a bool to a bitfield, and should be fairly backwards compatible if public=0 and private=1.

Indeed, it isn't the first time we have thought about this: T309693#7973529.

Alternatively we can introduce optional IP-masking protected variable and the actual IP can only be compared with such restricted variables, and the filter otherwise can be public. (For variables shared across filters see T120740, but a filter-local variable may still be possible)

If we want abuse log be restricted, we can automatically compute how a log entry is restricted and do not show those restricted abuse log by default. There are two ways to calculate it, the rough way is treat filter (versions) that use restricted variables and every hit thereof as restricted; the more fine-grained way is when actually running filter track whether restricted variables is actually read (it may be not since evaluation can be short cut).

This will introduce a restriction level in each abuse log entries (by comparsion the restriction level of filter can be automatically calculated, or not calculated at all in the fine-grained way, since abuse filter is mutable).

Change #1029122 had a related patch set uploaded (by STran; author: STran):

[mediawiki/extensions/AbuseFilter@master] Hide logs with protected variables behind viewing right

https://gerrit.wikimedia.org/r/1029122

Change #1029162 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/AbuseFilter@master] Ensure that changes to af_hidden are correctly set in afh_changed_fields

https://gerrit.wikimedia.org/r/1029162

Something else that I don't think has been said: Filter log entries would need to be deleted (as in "deleted for real with no backup") after 90 days, as per the data retention policy. Not just the "details", but the whole thing. If my IP happens to fall in the same range as some LTA, I don't want want that fact recorded forever.

Which may also apply to IP masking-protected filters.

Change #1032420 had a related patch set uploaded (by STran; author: STran):

[mediawiki/extensions/AbuseFilter@master] Convert `af_hidden` into a bitmask

https://gerrit.wikimedia.org/r/1032420

Change #1029122 abandoned by STran:

[mediawiki/extensions/AbuseFilter@master] Hide logs with protected variables behind viewing right

Reason:

This was merged into 1026560

https://gerrit.wikimedia.org/r/1029122

Change #1032420 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Convert `af_hidden` into a bitmask

https://gerrit.wikimedia.org/r/1032420

Change #1026560 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Allow variables to be restricted by user right

https://gerrit.wikimedia.org/r/1026560

@STran If I go to Special:AbuseFilter/revert/<id>, I can see actions which matched that filter, even if I don't have permission to see the filter. Is that OK?

On Special:AbuseLog, if I don't have permission to see a filter the log entry looks like:

... triggered an abuse filter, performing the action <action>. Actions taken: <actions taken>; Filter description: <filter name>

If we no longer show the filter's ID (instead just showing "an abuse filter"), should we also not show the <filter name>?

On Special:AbuseFilter, I can do a Search within rules and put something like user_unnamed_ip. This will then show the filter's rules in the table, even to users who don't have permission to see the filter.

For example: Special:AbuseFilter/?querypattern=user_unnamed_ip&searchoption=LIKE

abuse_filter_search_pattern.png (648×1 px, 82 KB)

@STran If I have two filters, one with user_unnamed_ip and one without, if they both match an action and I look at the log (details/examine) of the unprotected filter, I can see the value of the variable user_unnamed_ip (including the IP). Even if I don't have permission to see the protected filter.

Is this part of this task or should I raise a separate one?

Also, for the API action abusefiltercheckmatch, I can check a pattern like user_unnamed_ip="1.2.3.4" against an AbuseFilter log entry that I don't have permission to see and, if I guess the right IP, it will return true.

Change #1042228 had a related patch set uploaded (by STran; author: STran):

[mediawiki/extensions/AbuseFilter@master] Add protected variable view permission rights

https://gerrit.wikimedia.org/r/1042228

Looking into these cases, it looks like anywhere that uses canViewPrivateFilters(Logs) should be audited and an equivalent canViewProtectedVariables should be added if necessary.

This will resolve:

On Special:AbuseFilter, I can do a Search within rules and put something like user_unnamed_ip. This will then show the filter's rules in the table, even to users who don't have permission to see the filter.

Also, for the API action abusefiltercheckmatch, I can check a pattern like user_unnamed_ip="1.2.3.4" against an AbuseFilter log entry that I don't have permission to see and, if I guess the right IP, it will return true.

$ grep canViewPrivateFilters -l

tests/phpunit/unit/AbuseFilterPermissionManagerTest.php
includes/AbuseFilterPermissionManager.php

// Already has check
includes/View/AbuseFilterViewHistory.php
includes/View/AbuseFilterViewDiff.php
includes/Api/QueryAbuseFilters.php
includes/Pager/AbuseFilterHistoryPager.php

includes/Special/SpecialAbuseLog.php // Users who can't see private filters can't see the hits on that filter. Add similar check.
includes/View/AbuseFilterViewList.php // Users who can't see private filters can't use the rule search functionality. Add similar check.
includes/View/AbuseFilterViewEdit.php // Has check, improve check to also cover if a filter is loaded
includes/Api/QueryAbuseLog.php // Check didn't check against protected view right; users with view private right but not view protected right could see protected filter info
includes/Pager/AbuseFilterPager.php // Add similar check

If I go to Special:AbuseFilter/revert/<id>, I can see actions which matched that filter, even if I don't have permission to see the filter. Is that OK?

I think this is okay, as 1. you need the abusefilter-revert to see that page and 2. similar behavior exists for private filters (I can see the matches for a private filter even if I don't have the right to see a private filter).


If we no longer show the filter's ID (instead just showing "an abuse filter"), should we also not show the <filter name>?

afaik, this is also okay since private filters are similar (don't show id but allow name to be shown).

Change #1042228 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Add protected variable view permission checks

https://gerrit.wikimedia.org/r/1042228

The other issues should be resolved, save:

On Special:AbuseFilter, I can do a Search within rules and put something like user_unnamed_ip. This will then show the filter's rules in the table, even to users who don't have permission to see the filter.

which will be resolved with T363906: [Epic] Ensure filters that use PII-sensitive variables are protected

The other issues should be resolved, save:

On Special:AbuseFilter, I can do a Search within rules and put something like user_unnamed_ip. This will then show the filter's rules in the table, even to users who don't have permission to see the filter.

which will be resolved with T363906: [Epic] Ensure filters that use PII-sensitive variables are protected

Is this meant to say T367390: Add granular query restrictions for AbuseFilter filter search?

@STran is there anything left to do for this task?

I've been treating this as the epic for this work so there are still a few subtasks remaining around this task (rights and logging) but there isn't any technical work specific to this task anymore.

kostajh renamed this task from Ensure filters that use PII-sensitive variables are protected to [Epic] Ensure filters that use PII-sensitive variables are protected.Tue, Jul 23, 2:14 PM
kostajh added a project: Epic.