Page MenuHomePhabricator

Identify how abuse log details were purged from the CU logs
Closed, ResolvedPublic

Description

In T254934 we have a case where some parts of the abuse log appear to be purged in the CU log. That task contains private information and the logs are in Persian (fawiki) so to facilitate the investigation, I am creating this parallel task whereby I describe the situation in a de-identified way, so as to figure out how the data ended up being purged in that way.

CU get edits result

Running a "get edits" query on the user in question resulted in a log entry that appeared like below. I redacted some parts to preserve privacy (the parts I redacted are in ALLCAPS); I also changed the date and time for the same reason.

1 June 2020

    (Logs) . . 01:01 . . USERNAME talk contribs block (username removed) (log details removed)
            IP: 111.22.33.44   SOMEUSERAGENT

Given the username and the date and time, I quickly identified that the log is indeed an AbuseFilter log. Of note, the abuse log is *not* deleted or suppressed, and has never been.

Database record
A query on the cu_changes table was run on the production server, and the result looked like this (again, data has been modified to preserve identity):

select * from cu_changes where cuc_timestamp like '2020060101%' and cuc_user_text = 'USERNAME' limit 10\G
*************************** 3. row ***************************
        cuc_id: SOMEID
 cuc_namespace: 4
     cuc_title: PAGETITLE
      cuc_user: USERID
 cuc_user_text: USERNAME
cuc_actiontext: (username removed) (log details removed)
   cuc_comment: 
     cuc_minor: 0
   cuc_page_id: SOMEPAGEID
cuc_this_oldid: 0
cuc_last_oldid: 0
      cuc_type: 3
 cuc_timestamp: 20200601010101
        cuc_ip: 111.22.33.44
    cuc_ip_hex: SOMEIPHEX
       cuc_xff: 
   cuc_xff_hex: NULL
     cuc_agent: SOMEUSERAGENT

Analysis

I am having a hard time finding a hook in CheckUser code that would allow updating an existing cu_changes entry when its associated log is revdeled or supressed. This log was not revdeled or supressed, but if we had such a hook, maybe some piece of code incorrectly executed it? (UPDATE: this was not the case)

Alternatively, is it at all possible that the data was originally entered in this way (as opposed to being updated later)? (UPDATE: this was the case)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Huji Can you clarify if the CU edit result's (username removed) (log details removed) is your redaction or the purged info being missing? Same for your db query

@DannyS712 great question. And the answer is, no; those are not my redactions. That is exactly how the data appears in the DB.

Well, technically, instead of "(username removed)" you see "(نام کاربری حذف شد)" (which the same message in Persian). But that's beside the point.

Are there many other log entries in the database on fawiki (or enwiki) with the exact same cuc_actiontext? Would be helpful to know whether this is a regression, and if so, when it started appearing.

@Huji you note "I am having a hard time finding a hook in CheckUser code that would allow updating an existing cu_changes entry when its associated log is revdeled or supressed." - have you found any code that changes existing cu_changes entries? If not, I would agree with the theory that it was originally entered like this

A missing LogFormatter::setAudience call somewhere around here, I'd guess?

On second thought, what would be the correct behavior here? If a log entry is private, and some checkusers do not have the rights to see it, should they be able to see it in the CU log?

Are there many other log entries in the database on fawiki (or enwiki) with the exact same cuc_actiontext? Would be helpful to know whether this is a regression, and if so, when it started appearing.

@Ladsgroup is this something you could query? The result would be a single date after which such logs appeared, if such date exists.

A missing LogFormatter::setAudience call somewhere around here, I'd guess?

That is likely!

@Huji you note "I am having a hard time finding a hook in CheckUser code that would allow updating an existing cu_changes entry when its associated log is revdeled or supressed." - have you found any code that changes existing cu_changes entries? If not, I would agree with the theory that it was originally entered like this

That was a euphemism for "I cannot find such code". But see Tgr's comment above; I think he might have found it!

On second thought, what would be the correct behavior here? If a log entry is private, and some checkusers do not have the rights to see it, should they be able to see it in the CU log?

That is a separate issue. In this particular case, I have all the needed permissions (I hold sysop and oversight rights). More importantly: that decision should happen upon display of the entry, not upon storage of it in the DB.

I'd assume it started with rEABFa13799290068: Restrict viewing Special:Log/AbuseFilter, and remove from recent changes.

That is a separate issue. In this particular case, I have all the needed permissions (I hold sysop and oversight rights). More importantly: that decision should happen upon display of the entry, not upon storage of it in the DB.

In an ideal world, but the CU table has no such capability, so it has to be decided upon insertion whether every checkuser or no one can see the log message.

First off: I cannot replicate it on my local wiki. No matter how I configure things, abuse log entries are fully stored in the CU log and are never redacted.

However, I just created an account named User:T255506 and designed a filter that would be specifically triggered by this account, and then I used this account to save an edit and hence generate an abuse log. Next, I ran CU on that account and I found the details to be redacted.

This proves that the details are redacted upon entry of the log into CU data.

Great find!

That was deployed in late 2019; how come we never noticed this issue until June 2020?

Also, how is that I cannot replicate it on my local wiki?

That is a separate issue. In this particular case, I have all the needed permissions (I hold sysop and oversight rights). More importantly: that decision should happen upon display of the entry, not upon storage of it in the DB.

In an ideal world, but the CU table has no such capability, so it has to be decided upon insertion whether every checkuser or no one can see the log message.

Still, a separate issue. And likely one that would be addressed after T145265 is resolved. Let's not continue this discussion here.

@Tgr, the more I think about it, the more I think it must be due to a much more recently deployed change. I could swear that up until May or so, I had seen non-redacted abuse logs in CU results. Also, I found a case from May 2020 where the data was mentioned on checkuser wiki in non-redacted form (I left you a message on that wiki so you can take a look at it).

We are looking for something deployed late May.

Earlier, I checked several recent hits on private filters on enwiki and couldn't find any examples of missing CU data. This includes hits on a "no action" private filter (1007), and hits on a "disallow" private filter (1050), and includes edits as recently as within the last 24 hours. So this isn't universally happening to all filter hits. There must be some commonality between the users, filters, or perhaps wiki configurations, that is causing this inconsistent behavior.

  • Is this happening to *all* AbuseLog entries on fawiki?
  • Is this happening to *any* AbuseLog entries on enwiki?
  • Can we determine the last good / first bad instances of AbuseLog entries in cu_changes?

Here's a difference. On enwiki, all users can access https://en.wikipedia.org/wiki/Special:AbuseFilter , even if they are logged out. On fawiki, "The action you have requested is limited to users in one of the groups: Administrators, Patrollers, Autopatrollers, Eliminators, Abuse filter editors."

Perhaps removing one of the three following user rights from group (all) on your test wiki and repeating the test will allow you to replicate this issue:

View detailed abuse log entries (abusefilter-log-detail)
View edit filters (abusefilter-view)
View the abuse log (abusefilter-log)

If one of those affects self::canViewLogType() in this function, that would explain the issue. That's called by getActionText, which is called by getPlainActionText, which is called by CheckUser while inserting the row for an AbuseFilter log entry.

@ST47 I will test that shortly. I think it is a combination of the abuse logs not being public and the change I mentioned above (in which RCDatabaseLogEntry was replaced with RecentChange) that has caused this problem.

Perhaps removing one of the three following user rights from group (all) on your test wiki and repeating the test will allow you to replicate this issue:

View detailed abuse log entries (abusefilter-log-detail)
View edit filters (abusefilter-view)
View the abuse log (abusefilter-log)

Setting $wgGroupPermissions['*']['abusefilter-view'] = false; will cause the log to be redacted. I am now walking back in the history of rEABF to see which revision was the starting point for this.

Ok, now this is perplexing. No matter how far back I went on rEABF or rECHU, so long as $wgGroupPermissions['*']['abusefilter-view'] was set to false the logs would be redacted on my test wiki. This cannot be true though, because in the past (and if my memory serves, up until a few weeks/months ago) the logs were non-redacted on fawiki, and the $wgGroupPermissions['*']['abusefilter-view'] value for fawiki has not changed in years. So this has to be due to some change in some code recently.

I even went as far as backtracking AbuseFilter to a version in March, CheckUser to a version in March, and MW Core to a version in March (all with git reset --hard someid) and even in this setup the logs were redacted.

I lost my train half way through this task; however, in case it's useful, this is the code that sends AF entries to CU. If there's no code in CU for updating a CU entry when the associated event is deleted, then it must've been inserted as redacted. This should narrow down the problem to AbuseFilter creating the LogEntry in AbuseFilterRunner::addLogEntries (link above) or CheckUser consuming it in ::updateCheckUserData().

When generating the logs the permissions are checked against an anon/ip user due to call of RequestContext::newExtraneousContext (which does not have the rights mention above)..
It has to use a raw comment at some points to generate the comment (raw or trying to override with LogFormatter::setAudience).
This should be the same for log type in core with permissions like revision deletion and suppression delete or hide usernames.

I have no idea how long the checkuser data are stored on fawiki, but when there are only 90 days there is no way to look back to november 2019 where the breaking change was deployed.

I lost my train half way through this task; however, in case it's useful, this is the code that sends AF entries to CU. If there's no code in CU for updating a CU entry when the associated event is deleted, then it must've been inserted as redacted. This should narrow down the problem to AbuseFilter creating the LogEntry in AbuseFilterRunner::addLogEntries (link above) or CheckUser consuming it in ::updateCheckUserData().

You are correct. We established that further in the discussion on this task; when you get a chance to read through it you will see that I can now successfully replicate it on my test wiki, and the logs are indeed inserted as redacted.

When generating the logs the permissions are checked against an anon/ip user due to call of RequestContext::newExtraneousContext (which does not have the rights mention above)..
It has to use a raw comment at some points to generate the comment (raw or trying to override with LogFormatter::setAudience).
This should be the same for log type in core with permissions like revision deletion and suppression delete or hide usernames.

I have no idea how long the checkuser data are stored on fawiki, but when there are only 90 days there is no way to look back to november 2019 where the breaking change was deployed.

I am going to revert my test wiki to a revision prior to that Nov 2019 change to see if that makes this issue go away.

However, the key question is: now that we know this, how are we going to fix it? One option is to make abuse logs public on fawiki, but that is a workaround (and would require community consensus). But what is the solution that would allow us to have a wiki with non-public abuse logs but still able to have those logs stored in CU without redaction?

I can confirm that a13799290068 is the revision in which this stopped working for any wiki that sets $wgGroupPermissions['*']['abusefilter-view'] to false. By my count, this impacts 12 wikis: cawiki, ckbwiki, enwikibooks, eswikibooks, fawiki, frwikibooks, frwiktionary, itwiki, nlwiki, plwiki, thwiki, ukwiki. It has been impacting them since Nov 2019 and that has gone unnoticed somehow.

@DannyS712 one option we have is revert a13799290068 but I think you were well-intended in implementing that change so I'd rather exhaust the possibility of fixing this in another way before we decide to revert this change.

My understanding is because this all depends on RequestContext::newExtraneousContext() and that has no way of knowing all the possible permission configs, there is no other way (than reverting your change) to fix this regression. But I admit that I don't know Request processes well enough.

One option is to arr a "raw" audience option to LogFormatter (most other visibility-related code already has that) and set it in the RecentChange_save CheckUser hook. Hovever, that would mean that checkusers can see things like oversight logs, which I imagine is not wanted.

A more elaborate approach would be to define via a config variable a set of permissions the checkuser logs should "have", and fake them in the request context. Or maybe have the request context use a specific system user, and then bureaucrats can decide what rights to give to that user - that might be easier to self-manage.

@Tgr recall that prior to this change, CUs on those 12 wikis were able to see oversighted abuse filter logs. And that even now, in most wikis (all but the 12) they continue to be able to do so. And for all other things (edits, logs from other than abuse filter), they also can see everything, no matter if it is revdeled or oversighted.

While I appreciate that this, in and of itself, is an issue we need to resolve, I am also cognizant that solving the "what should CUs see" question will take a long time, and that would be at the expense of these 12 wikis (because during that time, their CUs will not be able to see abuse filter logs properly in CU data). I think on WMF wikis, CUs are always admins, and admins are always allowed to see abuse logs, so essentially, the current state is both broken and not helping with the permission issue either.

That is why I think we should first revert (or effectively undo this side effect of) a13799290068 and then solve the permissions issue at large, and finally bring a13799290068 back to life.

The "raw" audience option might be one way to effectively undo the undesirable side effect of that change.

After fix of T41013 it is possible to move the permission check to the UI of CheckUser, because than the logtype/subtype is known and can be checked. Than it should be possible to hide the necessary infos.
Only the log_deleted flag is missing, maybe needed to select for each entry (or stored, recentchanges table stored it for performance reason)

@Tgr recall that prior to this change, CUs on those 12 wikis were able to see oversighted abuse filter logs.

Logs are never oversighted when they are created (and passed to the RecentChanges_save hook) and later oversighting them does not affect the CU record. I'm talking about log entries which record oversight actions.

So far, my efforts to find a solution for this have not been successful. The directions @Tgr provided in T255506#6229375 make complete sense, but I am unable to turn them into working code.

A very cheap way to do it, which I think is kinda like the last option @Tgr suggested, is to chuck in

$user = User::newFromName("ObscureUsernameCheckUserLog");
if ( $user->getId() !== 0 ) {
	$context->setUser($user);
}

And this gets around the issue, assuming $user has the proper rights assigned to them. Probably the id != 0 check isn't required. But there's something that feels wrong about copying the rights off a real user, and at the very least it seems unnecessary. Not sure if there's an easy way to add a right to a dummy '127.0.0.1' user? That'd be a slightly better cheap workaround.

I think the raw audience option is probably the better way to do it, and make the permission check elsewhere, to prevent that CU viewing OS problem.

Code would be something like

$user = User::newSystemUser( 'CheckUserLogger', [ 'steal' => true ] );
$permissionManager->addTemporaryUserRights( $user, $wgCheckUserLogPermissions );
$context->setUser( $user );

@Tgr Doesn't seem like addTemporaryUserRights works, perhaps due to the different scope?

userrights.png (40×285 px, 6 KB)

Looks like the permission has already been consumed by the time it hits the formatter, in LogFormatter::canViewLogType. I'm not too familiar with this, but if that's indeed the problem, is it possible to stop the right from being consumed automatically?

Yeah, it should have been

$scope = $permissionManager->addTemporaryUserRights( $user, $wgCheckUserLogPermissions );

Change 608221 had a related patch set uploaded (by ProcrastinatingReader; owner: ProcrastinatingReader):
[mediawiki/extensions/CheckUser@master] Allow adding of additional rights to log user

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/ /608221

Proc triaged this task as Medium priority.

Change 608222 had a related patch set uploaded (by ProcrastinatingReader; owner: ProcrastinatingReader):
[operations/mediawiki-config@master] Add 'abusefilter-view' as a default right for the CU log user

https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/ /608222

Yeah, it should have been

$scope = $permissionManager->addTemporaryUserRights( $user, $wgCheckUserLogPermissions );

Brilliant! This solves the problem I had in turning your idea into code. Gladly, @Proc seems to have done it already.

Huji added a subscriber: Reedy.

@Proc your patch does the job. Once merged, the next steps would be:

  • Implement it for WMF by setting the global config variable (you already have a patch for it)
  • Add the definition of the new global to the MediaWiki wiki; there, we should briefly mention when it is needed (i.e. "if using AbuseFilter with such and such configuration")
  • Updating mw:Extension:CheckUser#Configuration to describe this new global and its usage.

Change 608221 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Allow adding of additional rights to log user

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/ /608221

I need someone to educate me on Deployment a bit more. Perhaps @Reedy could do that?

Given that https://gerrit.wikimedia.org/r/c/608221 was merged moments ago, how would I know when it would reach WMF wikis? It does not make sense to schedule https://gerrit.wikimedia.org/r/608222 for deployment until 221 is deployed. Looking at wikitech:Deployment what I deduce is that 221 will be deployed as part of the 1.35.0-wmf.39 train sometime Tuesday, but what I cannot figure out is when the wmf.39 branch will be created for rECHU. It seems like the wmf branches of rECHU are created either on Mondays or Tuesdays, but wikitech:Train deploys implies it should always occur on Tuesdays. Can you clarify that a bit? Is it a time zone issue?

Also can you also verify that if I schedule 222 for Tuesday's "Morning backport window" it would be properly timed? Or should schedule it for after the second wmf.39 train window (e.g. try Tuesday's Evening backport)?

Thanks @Reedy that was very helpful (especially the third one).

@Huji do you still plan to schedule the deploy for the 6th?

Yes, though I need to free up time on my end and am still working on that.

Yes, though I need to free up time on my end and am still working on that.

I can deploy it (probably in the US morning window) if you prefer.

@Proc Alright, it is scheduled for 18:00–19:00 UTC (during which you also have a config deployment scheduled, so see you soon!)
@Tgr we are all set. We wanted to quickly verify it worked by running a CU on a dummy account. Thanks for offering to help tho!

Change 608222 merged by jenkins-bot:
[operations/mediawiki-config@master] Add 'abusefilter-view' as a default right for the CU log user

https://gerrit.wikimedia.org/r/608222

Mentioned in SAL (#wikimedia-operations) [2020-07-06T18:24:18Z] <urbanecm@deploy1001> Synchronized wmf-config/abusefilter.php: 8878c60: Add abusefilter-view as a default right for the CU log user (T255506) (duration: 00m 55s)

Huji removed a project: Patch-For-Review.
Huji added a subscriber: Urbanecm.

Deployed, and successfully tested on fawiki. Thanks @Urbanecm

I believe this added param still needs to be documented at https://www.mediawiki.org/wiki/Extension:CheckUser. I wonder if a note should be added to https://www.mediawiki.org/wiki/Extension:AbuseFilter as well, since it appears to be the sole (?) culprit requiring this.

Since I can't think of any case where this functionality would be desirable, I'm wondering if this permission grant 'abusefilter-view' should be hardcoded into the extensions itself, rather than added as an option added in the config (https://gerrit.wikimedia.org/r/608222). Seems like an extra (confusing) hoop for non-WMF users of MediaWiki. This is a bug after all.

This is not the only config var that is not on that page. They should/will all be added as part of T256824

While I agree with you that the only *known* use case of this is AbuseFilter, I think keeping the current design is better than hard-coding it, because other use cases might exist in future.

Shouldn't those hypotheticals also be hardcoded in? My reasoning is that the point of configuration is for users to change how the software behaves. This is making adding a configuration variable a requirement for functionality to work as intended. There's no developer intention, or valid purpose, for this data to be purged from CU logs. So we're kinda just adding an extra hoop, and most non-Wikipedia users who run into this probably shouldn't have to jump into some obscure part of the doc to realise that they have to add a compulsory param. We can keep the config param too, I suppose, for future additions, I just think this particular param should also be hardcoded in.

Config vars being required is a familiar concept; you are also required to provide $wgDBserver and $wgDBtype for your wiki to work.

But I get your point.

A different approach we could have taken was to modify AbuseFilter's code such that it would check for the existence of CheckUser class, then check to see if $wgGroupPermissions['*']['abusefilter-view'] is false, and if both conditions were met, it would run $wgCheckUserLogAdditionalRights[] = 'abusefilter-view';

This way, no configuration would be needed on the user's part; the extensions will be "smart" and would apply the configuration when necessary. If you like this approach better, please reopen this task and either assign it to me or take it over and submit the necessary patches for me to review.

Sure, but there's a bit of a difference in having to write your DB credentials (something the software could never possibly auto determine), and having to add in a config variable for the software to work as we intended it to.

That approach you describe is what I was getting at. Either in CU adding permission based on the existence of AF, or in AF adding permission based on the existence of CU. Either one would be fine imo. Will reopen the task.

@Proc will you be submitting a patch too? I think it should be done in AF.

@Proc, another gentle ping about whether you could make the patches.

@Huji Yeah, I can submit the patches (one to add it into AF before the call, then another to revert the wmf-config patch). It'll probably be a week till I can get around to it, though. If you can get around to them sooner, it's fine with me for you to submit the patches :)

Change 619989 had a related patch set uploaded (by ProcrastinatingReader; owner: ProcrastinatingReader):
[mediawiki/extensions/AbuseFilter@master] Hardcode 'abusefilter-view' right when adding CU log entry

https://gerrit.wikimedia.org/r/619989

Change 619991 had a related patch set uploaded (by ProcrastinatingReader; owner: ProcrastinatingReader):
[operations/mediawiki-config@master] Remove abusefilter-view right grant from wmf-config

https://gerrit.wikimedia.org/r/619991

Change 619989 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Hardcode 'abusefilter-view' right when adding CU log entry

https://gerrit.wikimedia.org/r/619989

Change 619991 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove abusefilter-view right grant from wmf-config

https://gerrit.wikimedia.org/r/619991

Mentioned in SAL (#wikimedia-operations) [2020-09-15T18:28:08Z] <urbanecm@deploy1001> Synchronized wmf-config/abusefilter.php: 084729b7fd0716f11265f1b37570afc120b27109: Remove abusefilter-view right grant from wmf-config (T255506) (duration: 00m 56s)

Boldly closing, as we a) know why this happened b) fixed the issue. Thanks everyone who helped, especially @Proc and @Huji!

Change 627608 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "Remove abusefilter-view right grant from wmf-config"

https://gerrit.wikimedia.org/r/627608

Change 627608 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Remove abusefilter-view right grant from wmf-config"

https://gerrit.wikimedia.org/r/627608

Mentioned in SAL (#wikimedia-operations) [2020-09-15T23:00:07Z] <urbanecm@deploy1001> Synchronized wmf-config/abusefilter.php: 62b21d55a8f0a94b8cd268d5024df0cf64013dd5: Revert "Remove abusefilter-view right grant from wmf-config" (T255506) (duration: 00m 59s)

I reverted the config patch, as I realized it was too soon - and I tested at testwiki, with abusefilter-view granted to the world, so... THis doesn't hurt anything through, so keeping resolved, and sometime we can remove it.

I've created a follow-up at T266272, you're all welcome to chime in or subscribe if you want.