Page MenuHomePhabricator

Investigate: How should the AbuseFilter variable `user_unnamed_ip` be restricted
Closed, ResolvedPublic

Description

From T381052: Investigate: Should we remove the preference check from AbuseFilter's protected variables implementation?, if we need to gate a variable behind a preference check, we should implement the functionality/feature to do so instead of hardcoding a generic preference (see T380920: Remove the AbuseFilter protected variables preference gate). user_unnamed_ip is already protected (filters using it are restricted to users w/the right to view/use protected variables) but requires the following additional restrictions:

  • users should have agreed to the temp users preference/ToS
  • users should have the checkuser-temporary-account right

These are variable/temp account specific and therefore probably shouldn't be implemented in the core AbuseFilter extension. We'll need a method/hook/config/etc to declare and apply these restrictions/checks.

Given how tied into temporary accounts user_unnamed_ip is, it may also be necessary to reconsider if the variable should remain a core-provided variable or if it should instead be a custom variable created by CheckUser.

Acceptance criteria:

Related Objects

Event Timeline

Given how tied into temporary accounts user_unnamed_ip is, it may also be necessary to reconsider if the variable should remain a core-provided variable or if it should instead be a custom variable created by CheckUser.

This seems like a good idea, given that we will have done the following:

  1. Made the preference associated with the user_unnamed_ip variable be in CheckUser
  2. Log access to this variable in CheckUser logs (to keep temp account IP access in the same place)
  3. Restrict access to those who have either checkuser-temporary-account or checkuser-temporary-account-no-preference, making it depend on CheckUser again

On the other hand:

  1. The data for this comes from the AbuseFilter extension (namely afl_ip). This means that the data source would either need to still rely on AbuseFilter or change to the one stored by CheckUser.
  2. This makes it hard to update filters on third-party wikis without CheckUser installed. This also includes the beta cluster wikis

Some of this may be solvable by updating policy as part of T381722: Add abusefilter-access-protected-vars to frwiki EFM and remove it for sysops. However, the preference check would not be solved by that task.

In summary, it probably makes sense to make user_unnamed_ip be a variable provided by CheckUser so that we can keep the logging, preference checks, and user right checks in the same place as the definition of the variable. However, this still won't be an easy solution.

The alternative would be to allow CheckUser to redefine the requirements for access to user_unnamed_ip when installed, which seems to be adding an additional cross-dependency.

The alternative would be to allow CheckUser to redefine the requirements for access to user_unnamed_ip when installed, which seems to be adding an additional cross-dependency.

I think the cross-dependency is going to exist regardless of what solution we pick and we're just deciding where we want to put it. But I also think that this a valid technological translation from what our intent is here:

  • The IP exists in AbuseFilter and I believe we need to keep that, as at the time we record the IP (eg. blocking an action) it isn't necessarily known to CheckUser.
  • The restriction and logging constraints, while in line with how administrative actions are logged now, are additionally sometimes WMF-specific and relate to our Legal obligations.
  • In theory, temporary accounts are a core feature. AFAIK, it's entirely possible to run a wiki with temporary accounts and not block it on CheckUser although by doing so you then declare on that wiki that IPs are now sensitive data. By nature of CU being the only extension to expose that data to a UI, its opinion on access rights becomes the de facto default/baseline.
  • Also in theory but presumably never in practice, a wiki could declare that user_unnamed_ip isn't a protected variable by removing it from the protected vars declaration configuration and they would be free to use it for whatever they wanted, although it wouldn't be very useful without CU to expose the IPs to compare against.
  • Allowing CU to hook in here would give it a similar role to its effect on temporary accounts in core: the extension has an opinion on what it means to access what it considers sensitive data and is imposing it on AbuseFilter.
    • AbuseFilter already has awareness of CheckUser so this isn't an attempt to unwind that. Rather, as CheckUser is the centralized arbiter of access to temporary account data, it makes sense to try to keep all the logic regarding access there instead of in AF as well.

In this case, I think we would:

  • Keep user_unnamed_ip in AbuseFilter, as it's defined using AF data
  • Make CU declare that user_unnamed_ip is protected using the AbuseFilterCustomProtectedVariables hook
  • Add new hooks/configurations to support CU's other restrictions:
    • ProtectedVarsAccessLogger does a check against CU and writes to those logs if temporary accounts is enabled (Ic95f211f4db7ce6dc2d769d2f3af206f4a3935e4). I think we can unwind the dependency here a bit by instead adding a hook that CU can hook into and do whatever it needs to for logging. Off the top of my head, I don't recall if hooks can overwrite baseline functionality but if we could do something like that, I think that's a much clearer separation of concerns than our current solution
    • Protected variables aren't actually gated behind checkuser-temporary-account, they're gated behind abusefilter-access-protected-vars with the conflation being the current issue with T381722. We've stopgapped this with the preference check but my understanding is that we want to combine these under CU's one preference check. For this, we would need to add the ability to declare a protected variable as extra-protected and a way to respond to that configuration. Given that we already have a modifiable config, AbuseFilterProtectedVariables that can have variables added to it, I think it wouldn't be unreasonable to also modify it to be able to declare additional restrictions like rights and preferences. eg. [ "user_unnamed_ip" => [ 'rights' => [ 'checkuser-temporary-account' ], 'preferences' => [] ] ] and then respect them anywhere else we use permission checks like canViewProtectedVariableValues.

I do like this, as it keeps the variable w/the source information (AF) but all the access restrictions with the extension imposing them (CU) and I think this is actually easier than moving the variable, as most of the logging/gating work is already done in AF. Being optimistic, it does feel like a reasonable amount of work that could feasibly be done before major pilots deploy as we have about 2.5 sprints. This would allow us to remove the preference check which I believe we've received feedback on it saying it's confusing.

In addition, this should allow us the granularity to grant users access to protected variables w/o incidentally granting them access to IPs and I think unblock us on adding IP reputation variables as well as the questions around T381722: Add abusefilter-access-protected-vars to frwiki EFM and remove it for sysops.


As a worst case scenario, the preference is our CU check in AF right now and if we find that we're not able to finish the work before deploy deadline, we should at least find some temporary copy to put there while we work through it.

I would agree with your suggested course of action @STran, however, there is one point I would disagree with.

  • Make CU declare that user_unnamed_ip is protected using the AbuseFilterCustomProtectedVariables hook

I think we should still make this be declared as protected by AbuseFilter. I have two reasons for this:

  • A third-party wiki with temporary accounts enabled may not know about user_unnamed_ip, but assume that the temporary account IPs are not available for viewing (because CheckUser is not installed). If we were to leave this as not protected by default, it could lead to unintentional disclosure of this data when the third-party wiki is not aware of this.
  • I would argue that third-party wikis are likely to be more restrictive with access to temporary account IP addresses (unless we later enable temporary accounts by default).

Given this, I think we should be placing a baseline in AbuseFilter of at least no public access to the variable.

  • Protected variables aren't actually gated behind checkuser-temporary-account, they're gated behind abusefilter-access-protected-vars with the conflation being the current issue with T381722. We've stopgapped this with the preference check but my understanding is that we want to combine these under CU's one preference check. For this, we would need to add the ability to declare a protected variable as extra-protected and a way to respond to that configuration. Given that we already have a modifiable config, AbuseFilterProtectedVariables that can have variables added to it, I think it wouldn't be unreasonable to also modify it to be able to declare additional restrictions like rights and preferences. eg. [ "user_unnamed_ip" => [ 'rights' => [ 'checkuser-temporary-account' ], 'preferences' => [] ] ] and then respect them anywhere else we use permission checks like canViewProtectedVariableValues.

One thought is that instead of defining additional restrictions through rights and preferences keys, we could define some kind of way to provide a validator class or callback. This would then inform AbuseFilter if the user has access to see the variable.

I suggest this because, as it stands we would need to also prevent access to the variable if the user is blocked (to be consistent with IP reveal access requirements). This could be done via another array key (such as notblocked), but we want to make IP reveal accessible to partially blocked users (which may not be the case for other protected variable implementations).

I'm going to start work on T380918 again with these ideas, but go with the idea of some kind of callback to validate access. If we are concerned about performance, we could cache it.

I think we should still make this be declared as protected by AbuseFilter.

I think this is fine. If anything, we could allow both to declare it for pedantic reasons (AF as a reasonable default, CU because CU 100% wants to enforce it) and de-dupe the array if necessary.

we could define some kind of way to provide a validator class or callback. This would then inform AbuseFilter if the user has access to see the variable.

This is also fine. I think we should still try to declare it via the array, as it would centralize access definitions/modifications in the configuration. Using the array also wouldn't preclude us from adding on helper utilities like rights or preference gates down the line, as I think it would be nice to have those scaffolded and supported in AF should we ever find ourselves in this position again with a simpler set of constraints. In this case, I think the validation function makes sense as it's a little more complicated than just preferences/rights. We can always revisit if it ends up always being this complicated and we want to go back to splitting it into pre-defined supported limitations.

One thing that occurred to me - should we log that access was affected? A protected variable being viewed w/o these CU constraints means something different than if CU isn't enabled and that wouldn't be clear just by looking at AF logs. Additionally, if we're going to redirect as we're doing now, it's also not clear from AF logs that some of the logs also exist in CU's logs instead. This is an existing problem but I guess now's the time to re-surface it. I don't think this investigation is blocked on this question, as it can be answered as part of the logging refactoring task.


If there are no final objections, I'll go ahead and make the tickets and close this investigation out.

we could define some kind of way to provide a validator class or callback. This would then inform AbuseFilter if the user has access to see the variable.

This is also fine. I think we should still try to declare it via the array, as it would centralize access definitions/modifications in the configuration. Using the array also wouldn't preclude us from adding on helper utilities like rights or preference gates down the line, as I think it would be nice to have those scaffolded and supported in AF should we ever find ourselves in this position again with a simpler set of constraints. In this case, I think the validation function makes sense as it's a little more complicated than just preferences/rights. We can always revisit if it ends up always being this complicated and we want to go back to splitting it into pre-defined supported limitations.

The reason I suggest this is because the validator callback could be CheckUserPermissionManager::canAccessTemporaryAccountIPAddresses which would then deduplicate the logic for the CheckUser additional restrictions. Furthermore, in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/1100552 I was defining two callbacks because AbuseFilterPermissionManager has a method for seeing the variable and then another one for seeing the values of the variables. These are defined slightly differently, and I think we could want to do similar for CheckUser

If there are no final objections, I'll go ahead and make the tickets and close this investigation out.

I think we've been able to lay out the general plan, and can discuss specifics in those to be filed tasks. Just to note that T380918 exists already and I've started work on it. That would cover CheckUser defining the user_unnamed_ip variable as additionally protected. A different ticket could define how we specify additional restrictions if we wanted, or just re-use that ticket.

Per discussion:

Refactor logging to reduce/consolidate business logic:
T387324: Add hook to support custom logging in `ProtectedVarsAccessLogger`
T387328: Refactor CheckUser ProtectedVarsAccessLogger logging out of AbuseFilter
(non-blocking for temp accounts, blocking for IPReputation variables) T387329: Investigate: Confirm our logging expectations for AbuseFilter/CheckUser Temporary Accounts logs when using protected variables

Allow external sources to add their own protected vars access validation:
T387331: Provide a mechanism for other extensions to modify protected variables access requirements
T387333: CheckUser should impose IP viewing restrictions on AbuseFilter's `unnamed_user_ip` protected variable

CU should also declare user_unnamed_ip as protected (? I'm not strongly for or against this, as AF will do this by default but I think it's nice to declare regardless)
T380918: Provide capability to define new protected variables from other extensions

Just to note that T380918 exists already and I've started work on it. That would cover CheckUser defining the user_unnamed_ip variable as additionally protected. A different ticket could define how we specify additional restrictions if we wanted, or just re-use that ticket.

I think that defining a new protected variable (it looks like you're also doing the add a feature/implement a feature combo) can be split off from work that allows a protected variable to be extra protected, mostly because the latter is possibly a temporary accounts blocker whereas the former is blocking a different feature (IP reputation variables).