Page MenuHomePhabricator

Write a unique `var_dump` to each filter hit when necessary
Closed, ResolvedPublic

Description

Summary

When multiple filters trigger a hit, they all share the same var_dump. This is causing problems, as public filters can have var_dumps with protected variables and privileged information the user may not have access to. We need to make the var_dumps contain only the protected variables used in the filter

Background and Technical notes

When multiple filters trigger a hit, they all share the same var_dump. This is causing problems, as public filters can have var_dumps with protected variables and privileged information the user may not have access to. We've been solving this problem by filtering out variable values based on view rights but as we add more granular controls around what rights can view what types of information, it makes more sense to control it at the time of write instead of time of display.

For instance:

  1. Given the following filters:
    • Filter A is a public filter (action = 'edit')
    • Filter B is a protected filter with no additional restrictions (as of writing, this is hypothetical and refers to IP Reputation variables)
    • Filter C is a protected filter using user_unnamed_ip, which is additionally protected (user_unnamed_ip = '1.2.3.4')
  2. A user edits from IP 1.2.3.4, hitting all 3 filters (ids 1, 2, 3, corresponding to A, B, C)
  3. Filter C is updated to no longer use user_unnamed_ip (action = 'edit' - This is a duplicate of Filter A but as Filter C has been protected before, it remains protected now)
  4. A user edits, also hitting all 3 filters (ids 4, 5, 6)

We expect:

Filter IDPublic ViewProtected Var ViewProtected Var + IP Access
1
2
3
4
5
6

Right now, there is no good way to gate access to 3 as expected/wanted, as filter logs are associated to a filter id, not a filter revision. Ensuring that access to log 3 remains restricted as expected can only be done by either:

  • Checking the blob store for the used variable and restricting view on a per-log basis
  • Associating a filter revision w/the log and checking against variables used in that revision

Given that either will solve it, unique blobs will solve the product problem faster.

Acceptance criteria

  • AbuseFilter logs use a var dump which contains only the protected variables used in the associated filter

Event Timeline

Change #1131341 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/AbuseFilter@master] [WIP] Exclude protected variables from var dumps unless used by filter

https://gerrit.wikimedia.org/r/1131341

SpecialAbuseLog ideally does not have to filter out protected values anymore, as the blob should be accurate to the used variables; implementor to check if this needs to remain for historical reasons

Based on my investigation, we will need to keep this check because we cannot remove the user_unnamed_ip variable from the var dump for public logs. Therefore, we will need to keep this check for logs created before the patch in this task is merged and applied to WMF wikis.

We may be able to remove this check after 3 months of this patch being merged, because the value for user_unnamed_ip would no longer be stored.

While the check is extra code, I think it is still good to keep as a way to prevent issues if we have missed any edge cases with how var dumps are stored. As such, I'd suggest leaving this as it is but happy to create a follow-up ticket to remove if during review it is determined to be necessary.

Change #1131341 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Exclude protected variables from var dumps unless used by filter

https://gerrit.wikimedia.org/r/1131341

For QA I would suggest the following steps:

  1. Create the following testing filters
    • Filter A is a public filter (action = 'edit')
    • Filter B is a protected filter using user_unnamed_ip (user_unnamed_ip = '1.2.3.4')
  2. Edit using a temporary account from the IP in Filter B
  3. Log into an account with access to AbuseFilter logs and protected variables (an admin account is usually good enough for this)
  4. Open Special:AbuseLog and verify that two AbuseFilter logs are present for the edit made in step 2, one for Filter A and the other for Filter B
  5. Open the log for Filter B and assert that the user_unnamed_ip variable is present in the variable dump
  6. Open the log for Filter A and assert that user_unnamed_ip is not present in the variable dump

Alternatively we can still reuse the same variable dump, but protected variable will be stored in another table like abuse_filter_protected_variable.

Note current variable dump is stored in external storage, which can not be purged or redacted.

Alternatively we can still reuse the same variable dump, but protected variable will be stored in another table like abuse_filter_protected_variable.

I don't think we can do this. We want to expire the value but not remove that the variable was previously stored for the log entry. Having the variable name set with no value set allows this to be the view shown when expired. It also allows us to work out if we need to attempt to populate the value (because we would only read the data from the new table if we needed to populate the data).

Having the variable name set with no value set allows this to be the view shown when expired.

However, external storage is immutable, see T234155#9789762

Having the variable name set with no value set allows this to be the view shown when expired.

However, external storage is immutable, see T234155#9789762

Yes. We won't need to modify the value in the var dump because it's set to nothing. This what is already done for the existing user_unnamed_ip variable.

Djackson-ctr subscribed.

QA is completed, I have verified the new code has been implemented and is functioning Per the Acceptance Criteria (The AbuseFilter logs use a var dump which contains only the protected variables used in the associated filter). Thank you for the QA Steps @Dreamy_Jazz.