Summary
We should expire MediaWiki-extensions-IPReputation AbuseFilter variable data when the IP address expires. To do this we need to add support for protected variable values to be purged when the IP address is purged.
Background
- It is not a WMF Legal requirement to purge MediaWiki-extensions-IPReputation variable data, but it was considered that there is a risk in keeping this data when the IP address has purged.
- This is especially the case when the action is an account creation, as it would tie data about an IP address to a registered user forever if an AbuseFilter matched on their account creation
- This also could be an issue for temporary accounts, where their IP address expires after 90 days
- We need to consider when the MediaWiki-extensions-IPReputation protected variables need to be expired, and whether we should expire in all cases for consistency
- To do this we need a mechanism to store some protected variable values in the database
- This mechanism should ideally not be too specific to MediaWiki-extensions-IPReputation AbuseFilter variables, to avoid the need to undo existing work if other variables are added
- However, we need to consider that we would likely need to expire the data at the same time as the IP address so may still need to bound this expiration to a standard retention period
- This solution may also be useful to consider for the existing user_unnamed_ip variable, as we could then remove some of it's specific handling code in favour of using this method
Technical notes
We will need a schema change to support this to the abuse_filter_log tableThe schema change would likely involve adding a column that acts in a similar way to the log_params column (a serialised PHP array of parameters, in this case variable names to values)When reading or writing the variable dump, the protected variables that need to have their values expire would be written to this new column instead of the blob used for all other variables
- We can re-use the afl_var_dump column by making it a JSON array if the log contains variables that need purging
- When the values are purged, the column could be reverted back to no longer use JSON as a space saving measure. Alternatively, we could make all afl_var_dump column values use a JSON array to be consistent.
- We will need code to remove protected variables from afl_var_dump when they should be expired
- To support this, we need a index which partially indexes afl_var_dump (just one character should be fine) and afl_timestamp. Then the query would look for the character { at the start of afl_var_dump. If it exists then the field should be in JSON and have protected variables to purge.
- The other methods of doing this like adding a new tinyint column or making afl_ip nullable were less liked by DBAs.
- To support this, we need a index which partially indexes afl_var_dump (just one character should be fine) and afl_timestamp. Then the query would look for the character { at the start of afl_var_dump. If it exists then the field should be in JSON and have protected variables to purge.
Acceptance criteria
- AbuseFilter supports protected variables being configured to have their associated values be purged when the IP address associated with the log is purged
- MediaWiki-extensions-IPReputation variables are configured in this way to purge their values, which should at least be used when an account creation causes a log entry