Page MenuHomePhabricator

Special:AbuseLog issues when filtering vs not filtering results
Open, Needs TriagePublic

Assigned To
None
Authored By
Oshwah
Jan 16 2019, 11:37 PM
Referenced Files
F27915865: 2.png
Jan 16 2019, 11:37 PM
F27915850: 1.png
Jan 16 2019, 11:37 PM
Tokens
"Burninate" token, awarded by ToBeFree.

Description

While evaluating the block of an LTA sock puppet user, I went to Special:AbuseLog and filtered the results by the account's username ("MusikAnimal Devotes life to stop Glapz"). No results are found. See screenshot below:

1.png (974×1 px, 56 KB)

However when removing all parameters and going through the unfiltered log as it is, I am able to find two logs that should have been returned when I filtered the log by the user name above. See screenshot below:

2.png (972×1 px, 216 KB)

In this particular example, filtering the AbuseLog by the username returns no results, but viewing the log unfiltered (in fact, I was able to filter by Filter IDs without issue) will show results that should have been present in the previous query.

We might be having issues with inconsistent queries / returns involving this log or possibly the filter when the username is added as a condition.

Event Timeline

It's probably because the username was revision-deleted and so doesn't show up in username-specific logs. Of course you can see the name, but for you to also see the relevant search results for the first image would require entire log entries to be visible only to admins. I don't think you can do that. The fix for this task, if one was implemented, would consist of making that possible. Either that or change how revdel works with usernames so everyone can see the logs, but that might defeat the purpose of it.

Interesting... so even with the visibility set to "all", it will not return results due to the creation log being RD'd? Seems like a big inconsistency here...

@Ekips39: The account creation log isn't revision-deleted, and neither are the AbuseLogs (they would state this next to each log entry in the second screenshot). Am I looking in the wrong place or at the wrong log?

I thought the crossed-out usernames in the second picture were due to revdel on the filter log entries, but turns out I was wrong and they're actually blocked. Must be one of your scripts. Never mind my original comment.

I have another possibly inaccurate guess. The account wasn't registered at the time those log entries were created. They're for filters tripping on account creation attempts, so the "account" that trips them can't exist yet. This one exists now, but these entries aren't associated with it. You can tell because the contribs link is missing, as with IP addresses and usernames that aren't associated with accounts -- (talk | block) instead of (talk | contribs | block). Things that are associated with a username but not the corresponding account often don't show up where they should, and this could easily be another case of that.

PS -- didn't actually mean to subscribe in the first place. It happened somehow when I logged in.

This appears to be the case for any filter that is activated on account creation. For whatever reason, the log is not actually attached to the account (meaning that you can't search for the account name to find the hit).

My sincere apologies for the confusion - yes, crossed out edits do not refer to the visibility of the log or entry at all. It simply means that the user is currently blocked from editing.

I searched for a similar situation with another username (where the filters log a 'createaccount' attempt that takes no blocking action and where the username creation is successful): See also User:Hacker689 .... Filtering by nothing returns logs by this account. Adding this username to the filter returns no results.

On the other aspect, If I enter a username that doesn't exist (AbuseFilters are logged and actually blocked the creation), the results return without issue.... In fact, look at this URL below:

https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchUser=Mehdi.etesamipour

An AbuseLog shows an entry where no action was taken, but the username does not exist... are more logs possibly missing? In what situations (other than blocked AbuseFilter actions) would this happen?

The issue appears to be caused by afl_user being set to 0 on user creation (likely because the user account does not exist). But, when searching though the abuse logs, a condition is added to search by afl_user, which, as long as the account now exists, cannot be 0. Is this necessary at all, given that afl_user_text should be unique? Maybe, given that a disallowed user creation could add entries from different people.

Will investigate later. Below is a possible explanation.
On "createaccount" the filter is triggered by anonymous users (i.e. you would find it by searching the IP which triggered it). However, in order not to publicly show the IP, AF replaces it with the new account name before saving the log entry, which is pretty hacky, has lead to other problems, and doesn't actually completely prevents the IP from being leaked in several specific cases. A proper solution would probably be to change this behaviour and make it somehow saner. As for this specific problem, I think (but I have to check) that we inconsistently use anonymous users (i.e. IPs) and fake users with the new username in different parts of the code, and thus searching by username doesn't work because entries reference the IP.

I checked on my wiki and found the cause.
First, filters are executed and we eventually save an AbuseLog entry. There we use $user->getId() and $user->getName() to get such data. Now, if the action is an account creation $user is the creator, passed by the AuthenticationProvider here, and is an anonymous user. Thus, calling getName and getId on it returns respectively the IP and 0. This way, we would just end up saving the IP as log performer, however it's not over. As I was saying in my previous comment, AbuseFilter performs the horrible hack of changing afl_user_text to the account name, in order to avoid showing the IP. Thus, what we get in the end is a DB row where afl_user is 0 and afl_user_text is the account name.
Back to Special:AbuseLog, if a user is specified we try to build a User object from the given name. This creates two cases:

  1. If the account doesn't exist (assume creation was disallowed by AbuseFilter), then we'll query the abuse_filter_log to find rows with afl_user === 0 and afl_user_text = the_one_specified_in_the_search_bar. This will succeed, proof.
  2. If the account exists (we can assume that this means that the creation wasn't disallowed by AbuseFilter or anything else), the User object will have a non-zero ID and a name, so we'll try to match an abuse_filter_log with such ID and name, but this obviously fails because we stored a 0 as user ID, proof.

And now, the solution. My first thought is that a proper solution should actually involve removing the hack above and, in general, an overhaul of filtering for the createaccount. However, to be honest, AbuseFilter don't have many maintainers, and I'm the author of most of the later patches. There are many patches under review, but we need someone to review them. Specifically, it won't be easy to find a reviewer for a change as big (potentially) as the aforementioned overhaul, given that it's hard to find reviewers even for one-line or doc-only patches.
That being said, we can of course put some other hack in place, possibly not a horrible one. The main point is that, when saving the log ID, the user doesn't still exist, and thus we don't have an ID available. For some similar cases, the solution adopted (within AbuseFilter codebase) is to store the data in a static variable, and then reuse it when needed. For instance, the afl_rev_id field (which is a foreign key to revision.rev_id and allows linking diffs in AbuseLog) is populated in the onPageContentSaveComplete hook using IDs stored in a static variable. We could do the same here: save the account name in a static var and update afl_user in another hook handler.
However, both solutions (long-term and hack) won't do anything for old DB entries, which will still hold a 0 ID. For those, we could use a maintenance script, since we already have the user name (and the ID is trivial to get then).

Adding to radar for the moment. However, I feel that (unless someone offers to review the patch) we should just avoid adding yet another hack and go ahead with the stable solution.