Investigate what to do about the AbuseFilter log revealing someone's IP address via historical logs
Open, Needs TriagePublic
Actions

Description

Background

T363906 introduces the concept of variables that have PII, specifically a user_unnamed_ip variable, for use when temporary accounts are enabled, since user_name will no longer be the IP address. (This will not be available for fully registered users, just temporary users.)

The IP address in the filter and the filter details will only be readable by users who have access to reveal IP addresses. As will the logs for that filter being triggered. In accordance with our policy of deleting IP addresses after a fixed time, the value will be stored in afl_ip (separately from the rest of the data, in afl_var_dump), so that it can be purged after the fixed time.

However, as it stands, logs will be visible forever, so whoever can read a filter containing the IP address can see who triggered the filter from that address or range.

Is this a problem?

This was mentioned up in T363906#9782548.

There's a comparable case in CheckUser, where it can be accurately guessed from the CheckUser logs which users are associated with which IP addresses, even after the IPs have been removed. This case is arguably worse, if the barrier to triggering a filter is lower than the barrier to triggering a CheckUser investigation.

What can be done?

Some suggestions:

Do nothing
Purge any logs for filters containing IP addresses after a time
Remove the filter ID from any logs for filters containing IP addresses after a fixed time

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
Resolved	kostajh	T294511 2021 Security Team wikireplicas audit
Declined	None	T284948 Raw IPs of logged-out users disclosed in wiki-replicas
In Progress	Niharika	T324492 Temporary accounts - MVP
Open	None	T326816 [Epic] Update features for temporary accounts
Open	Tchanders	T326869 Update TSP-owned products that may be affected by IP Masking
Resolved	• lbowmaker	T262321 IP Masking
Resolved	• tstarling	T300263 [IP Masking] Create temporary account on first edit
Open	None	T307060 [Epic] Temporary account AbuseFilter support
Resolved	STran	T357772 Investigate: How will `ip_in_range` and `ip_in_ranges` function when temporary accounts are enabled
Resolved	STran	T363906 [Epic] Ensure filters that use PII-sensitive variables are protected
Open	None	T365049 Investigate what to do about the AbuseFilter log revealing someone's IP address via historical logs

Event Timeline

Thanks @Dreamy_Jazz for discussing this with me.

Tchanders updated the task description. (Show Details)May 21 2024, 11:36 AM

kostajh updated the task description. (Show Details)Jun 11 2024, 11:33 AM

After discussing with @STran, our thoughts are:

Go with the "Do nothing" approach mentioned in the task description
Rely on the logging we implement in T365743: Log when AbuseFilter user sees IP address associated with temp account via user_unnamed_ip variable trigger to detect misuse / abuse of this information

The "do nothing" approach could be changed at some later point in time, given more time/resources to work on the problem.

cc @Tchanders @Dreamy_Jazz

Thanks @kostajh , @STran . That makes sense to me.

kostajh closed this task as Resolved.Jun 24 2024, 9:44 AM

kostajh claimed this task.

Reopening, following a discussion with Legal.

Bugreporter updated the task description. (Show Details)Wed, Dec 4, 2:42 PM

Notes:

We need to remove the connection between a user name and their IP address after 90 days.
The connection can be found by looking at the user name and filter ID in the log line, and inspecting the contents of the filter with that ID, at the timestamp of the log. This could reveal the user's IP.
The logs are stored in the abuse_filter_log table.
The triggering variables are stored in the afl_var_dump field. This is stored in ExternalStorage which cannot be changed. However, since this patch, the IP address is not stored here. Instead we store true if the IP address triggered the filter.
We could remove the connection between the user name and the IP after 90 days by modifying the entry in the abuse_filter_log table to remove either the filter ID or the performer name and ID.

In T365049#10379975, @Tchanders wrote:

We could remove the connection between the user name and the IP after 90 days by modifying the entry in the abuse_filter_log table to remove either the filter ID or the performer name and ID.

I think it would be nice if we could swap the performer name and ID out with a generic A temporary account string, e.g. A temporary account trigger filter {number}.

I wonder if it's possible to try and maintain both bits of the information even if they're no longer associable. eg. if a log reads: ~2024-7 triggered filter 3, performing the action "edit" on Main Page2. Actions taken: Disallow; Filter description: 3, it would be nice to be able to:

search for ~2024-7 and see something like ~2024-7 triggered a filter
search for 3 (the filter id) and see something like a user triggered filter 3, performing the action "edit" on Main Page2. Actions taken: Disallow; Filter description: 3

I think both provide valuable information - the first is the account's abuse history and the latter is the filter's history. Unfortunately, this is all stored on one row:

+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
| afl_id | afl_global | afl_filter_id | afl_user | afl_user_text | afl_ip    | afl_action | afl_actions | afl_var_dump | afl_timestamp  | afl_namespace | afl_title  | afl_wiki | afl_deleted | afl_patrolled_by | afl_rev_id |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
|      1 |          0 |             3 |        8 | ~2024-7       | 127.0.0.1 | edit       | disallow    | tt:8         | 20241205163742 |             0 | Main_Page2 | NULL     |           0 |                0 |       NULL |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+

so to do so, we'd probably have double up to make this happen eg.

+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
| afl_id | afl_global | afl_filter_id | afl_user | afl_user_text | afl_ip    | afl_action | afl_actions | afl_var_dump | afl_timestamp  | afl_namespace | afl_title  | afl_wiki | afl_deleted | afl_patrolled_by | afl_rev_id |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
|      1 |          0 |             3 |        8 | user          | 127.0.0.1 | edit       | disallow    | tt:8         | 20241205163742 |             0 | Main_Page2 | NULL     |           0 |                0 |       NULL |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
|      1 |          0 |             3 |       -1 | ~2024-7       | 127.0.0.1 | edit       | disallow    | tt:8         | 20241205163742 |             0 | Main_Page2 | NULL     |           0 |                0 |       NULL |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+

and would probably cause a bunch of downstream problems but it still might be worth considering. The alternative is as stated, to pick whichever one we think is more important and remove the other value after 90 days.

In T365049#10384117, @STran wrote:

search for ~2024-7 and see something like ~2024-7 triggered a filter
search for 3 (the filter id) and see something like a user triggered filter 3, performing the action "edit" on Main Page2. Actions taken: Disallow; Filter description: 3

I think both provide valuable information - the first is the account's abuse history and the latter is the filter's history. Unfortunately, this is all stored on one row:

+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
| afl_id | afl_global | afl_filter_id | afl_user | afl_user_text | afl_ip    | afl_action | afl_actions | afl_var_dump | afl_timestamp  | afl_namespace | afl_title  | afl_wiki | afl_deleted | afl_patrolled_by | afl_rev_id |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
|      1 |          0 |             3 |        8 | ~2024-7       | 127.0.0.1 | edit       | disallow    | tt:8         | 20241205163742 |             0 | Main_Page2 | NULL     |           0 |                0 |       NULL |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+

so to do so, we'd probably have double up to make this happen eg.

+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
| afl_id | afl_global | afl_filter_id | afl_user | afl_user_text | afl_ip    | afl_action | afl_actions | afl_var_dump | afl_timestamp  | afl_namespace | afl_title  | afl_wiki | afl_deleted | afl_patrolled_by | afl_rev_id |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
|      1 |          0 |             3 |        8 | user          | 127.0.0.1 | edit       | disallow    | tt:8         | 20241205163742 |             0 | Main_Page2 | NULL     |           0 |                0 |       NULL |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+
|      1 |          0 |             3 |       -1 | ~2024-7       | 127.0.0.1 | edit       | disallow    | tt:8         | 20241205163742 |             0 | Main_Page2 | NULL     |           0 |                0 |       NULL |
+--------+------------+---------------+----------+---------------+-----------+------------+-------------+--------------+----------------+---------------+------------+----------+-------------+------------------+------------+

The issue is they may be easily connected since afl_timestamp and afl_var_dump is same.

Tchanders added a project: Temporary accounts (Major pilot wiki deployment).Thu, Dec 5, 4:57 PM

Tchanders moved this task from Backlog to Planned on the Temporary accounts (Major pilot wiki deployment) board.

Note also the cloud view of abuse_filter_log has been removed in T375751: Public wiki replicas contain abuse filter logs for filters that are private or protected.

The issue is they may be easily connected since afl_timestamp and afl_var_dump is same.

I think this is also resolvable. If we're in here making these sorts of edits we can delete afl_var_dump for the account info row as it'll be captured in the filter info row and I suppose similarly remove some fidelity from afl_timestamp (to the day or something).

However, is it valuable to keep that generalized information? Or are the specifics very important to historical abuse logs and if we can't have that it's not worth keeping anything?

We're also going to need a way to find and purge these. The obvious solution is to purge based on the protected flag but we're about to add more variables to that state (IP reputation variables) and those variables aren't considered sensitive the way user_unnamed_ip is. I originally argued we should keep all of these variables under the protected workflow in order to avoid adding complexity but if we're going to have to purge on the inferred attack surface, it might be better to create a sensitive flag to do so?

I suppose similarly remove some fidelity from afl_timestamp (to the day or something).

This is still not enough - Take a popular non-protected enwiki article, Tom Hardy, as example, it hits abuse filter 19 times in 2024: https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchTitle=Tom+Hardy so if a temp account edit it the relationship is still easy to find. In addition users can just browse abuse log page-by-page - it is ordered by afl_id - to find the connection.

kostajh removed kostajh as the assignee of this task.Tue, Dec 10, 2:59 PM

It is worth mentioning revision (change) tags also stay forever. So purging logs wouldn't help if the filter also applies tags to matching edits.

It would help Legal if we could present some specific approaches for their consideration.

@STran Would it be possible to make a summary of how we could do this, taking into account the comments added so far, to present to Legal?

	Tchanders
	May 15 2024, 5:45 PM

Investigate what to do about the AbuseFilter log revealing someone's IP address via historical logsOpen, Needs TriagePublicActions