Page MenuHomePhabricator

Investigate: Update AbuseFilter for IP Masking
Closed, ResolvedPublic2 Estimated Story Points

Description

Background

We need to investigate how AbuseFilter may need updating for IP Masking.

From a product perspective, read the documentation and try out the product. Consider whether any changes may need making for temporary account users.

From a technical perspective, check where the search terms from T326759 appear, and consider whether these places appear to need updating.

There is a related task at T307060: [Epic] Temporary account AbuseFilter support. Please take a look at it during your investigation.

These instructions are left intentionally vague to avoid biasing the investigation.

Outcome of this investigation
  • Write up a summary of findings
  • Raise any questions
  • File tasks if there is anything obvious to file

Related Objects

Event Timeline

From this investigation, I note the following:

UI and user scenarios

The AbuseFilter extension uses IPs and exposes them on:

Special:AbuseFilter/examine

Screenshot 2023-04-19 at 9.30.58 AM.png (1×1 px, 173 KB)

Special:AbuseLog

Screenshot 2023-04-19 at 9.31.31 AM.png (528×2 px, 196 KB)

Special:AbuseFilter/new

At least with the use of the filter "ip_in_range() and "ip_in_ranges()"

Screenshot 2023-04-19 at 9.32.46 AM.png (270×1 px, 101 KB)

Ultimately, the block against an IP (range) created via an AbuseFilter will end on

Special:BlockList

Screenshot 2023-04-19 at 9.36.36 AM.png (948×2 px, 189 KB)

Special:Log

Screenshot 2023-04-19 at 9.37.45 AM.png (264×2 px, 175 KB)

Special:CheckUser

Screenshot 2023-04-19 at 10.32.11 AM.png (1×2 px, 332 KB)

Concerns: How to present a block against an IP or IP range that was hit before creating an edit and therefore not having a Temp Account Name? As raised in T334623
At least for Special:Log, considering it's visible to AnonUser.

Existing Issues

As seen in T328403 and T334623
An anon user, editing under an IP that is currently blocked by the AbuseFilter, will not have the chance to submit their work and therefore create a temporary account user and will be either:


  • Log as an IP to AbuseLog (Visible to the Anon users)

  • Create a bug: "CannotCreateActorException: Cannot create an actor for a usable name that is not an existing user: user_name="*Unregistered 25""


Screenshot 2023-04-17 at 11.22.44 AM.png (1×2 px, 699 KB)

Next Steps

Current Bugs
Special:AbuseLog is missing the mw-tempuserlink class from temporary account user links T328311
AbuseFilter records IP address on a user's first edit T328403
How do we log unsuccessful first edits for temporary users? T334623
Investigate AbuseFilter Hook with CheckUser T335064

Updating the files
Updating the files with the right user verification T335062

Thanks for this @AGueyte! It looks as though the follow-up work is all captured by tasks, so I'll move to Done and leave open for @Niharika to review.

I want to reiterate what I wrote in T335062:

Noting here that being able to use the IP of logged-out users is a very important feature that AF currently provides. The expectation is that user_name will be the IP of logged-out users, and many filters make this assumption. Finding all of them is going to be hard, but for instance, all filters using ip_in_range or ip_in_ranges are going to be affected; see partial list of wikis in the following NDA paste: P47270.

Also, many filter use "user" in user_groups (or variations thereof) to check if a user is logged-in or not. Temp users are in the user group, so this is also going to fail.

These are just the first two examples I could think of, but there could be more. Updating the AF codebase might be relatively easy; updating all the existing filters, not so much I believe. Maybe it'd be a little bit easier if AF also provided a user_is_temp variable or something, but that still wouldn't address some issues like accessing the user's IP.

The codebase itself might be OK wrt temp users now, but IP masking is going to have a pretty huge impact on existing filters that would require more than just updating isRegistered/isAnon checks in the code.

I don't really have much advice to offer here, but I do want to make sure that these issues are known.

Thanks @Daimona.

@Niharika is having a look into the things you've raised in your comment, and we'll keep this task open until we resolve those. We may need some strategy for working with communities to update filters, and to discuss these changes and mitigations in general, and we may need to plan feature improvements like your suggestion of the user_is_temp variable.

Also, many filter use "user" in user_groups (or variations thereof) to check if a user is logged-in or not. Temp users are in the user group, so this is also going to fail.

This should be solved by T340457: In UserGroupManager::getUserImplicitGroups, add temporary users to a 'temp' group instead of the 'user' group.

Following on from @Daimona, I would also like to point out the presence of ip_in_range (and ip_in_ranges) function. The only meaningful use case pattern is ip_in_range(user_name, '1.2.3.0/24'), but this will stop working when user_name isn't an IP address anymore.

Aklapper added a subscriber: AGueyte.

Removing inactive task assignee (please do so as part of offboarding steps).

@STran I notice AbuseFilter saves IP addresses in abuse_filter_log.afl_ip: T352914#9507676

We should dig into this if you haven't already...

Saving IP addresses can be OK (e.g. T356430#9507644) or not OK (e.g. T345578#9290889).

Okay, here are some summaries of my research and technical proposals. I don't think my conclusions here encompass everything that needs to be done but hopefully it's enough to get us going in the right direction.

Part of the problem is that temp accounts wants to make an account on an action, whereas AbuseFilter usually is intended to stop that action from ever happening. In order to maintain parity with what AbuseFilters can do right now, we would need to deal with the following situations. Whether or not we can/should is a different question. The following is a summary of the different ways in which AbuseFilter uses IPs. Additional details are heavily dependent on sensitive filters and won't be shared here:

T357615: Create filters to distinguish anonymous/temporary/registered users (if we want to do it) should solve the first situation but afaik, there isn't any way to implement a temp account version of referring to a specific IP or specific sets of IP so AbuseFilter will probably need to continue to know about IPs. AbuseFilter can already know about IPs, as it'll log them into afl_ip. I skimmed some of the code that does this and it looks like gets the IP from the Request. Presumably, Request->getIP() isn't going away.

Additionally, we probably wouldn't break the ip_in_range(s) filters (conceptually at least) if we kept supporting IPs in AbuseFilter. If they use the username then we'd probably have to update them to use the afl_ip.

If we keep IPs, we'd need to 1. have a way to automatically protect them from public view and 2. purge old IP data.

  1. T234155: Create CheckUser-level abuse filters could possibly be used to investigate and solve this.
  2. maintenance/PurgeOldLogIPData.php exists already and blanks out afl_ip. I don't know if this runs regularly but it's there and should solve this concern.

Additionally, there is an open bug where if an anonymous user attempts to perform an action caught by AbuseFilter, it will try to log a temp account that doesn't exist (T334623: How do we log unsuccessful first edits for temporary users?). We need to decide if we want to create a new temporary account on these actions. An alternative solution would be to use a system user like CheckUser is considering (see T353953: Don't use actor IDs for private CheckUser events when these actions are performed by IP addresses).

Temp accounts will break some existing filters so we'd need to go through and either deprecate/disable them if they're not in use and update the ones that are still relevant. As these are still in use, that should probably be specified in a separate protected task.


Finally, there are a few open questions:

  • How many accounts would we make if we made a temp account on disallowed action?
  • If someone triggers an abuse filter, will it be logged in CheckUser?
  • AbuseFilter currently logs IP addresses for anyone who triggers a filter. What do we do about this, if anything?

Thanks for the summary @STran. Based on that summary, how does this sound as a plan for your next steps:

Filtering based on type of user

Filtering based on actual IP or range for non-IP actors

Ensuring IP information is only shown to users with correct permissions

  • Find out how regularly maintenance/PurgeOldLogIPData.php runs and how long we keep data, and add the answers to T357682.
  • Find out which permissions you need to see the IP addresses stored in afl_ip, and which groups have those permissions, and add the answers to T357682.

Logging unsuccessful first edit attempts

Dealing with existing filters that will break

  • File a task for this. Arrange a conversation with @Daimona and @sgrabarczuk to find out either how we can do this, or who we can ask for help.
Urbanecm_WMF subscribed.

Ensuring IP information is only shown to users with correct permissions

  • Find out how regularly maintenance/PurgeOldLogIPData.php runs and how long we keep data, and add the answers to T357682.
  • Find out which permissions you need to see the IP addresses stored in afl_ip, and which groups have those permissions, and add the answers to T357682.

Since I know the answers from the top of my head, I just did that :).

I filed the additional tasks T357772: Investigate: How will `ip_in_range` and `ip_in_ranges` function when temporary accounts are enabled and T357774: Investigate: What to do with existing filters that temporary accounts will break to capture the rest of the work noted by my investigation and I've put every task as a child of the parent T307060: [Epic] Temporary account AbuseFilter support so it can be tracked under that umbrella. I'm going to call this task done for now but please re-open if you disagree. 🙇