Page MenuHomePhabricator

AbuseFilter prevented edits due to temporary accounts
Closed, ResolvedPublicBUG REPORT

Description

I’m not entirely sure what the exact problem is, and when I tried to reproduce it now on testwiki I couldn’t reproduce it, but I’ll describe what happened to us on hewiki.
There is an AbuseFilter that checks whether an account is in a certain ip_in_range, and also whether (count(removed_links) > 0). The filter does not prevent editing, but it displays a warning to the user. (Special:AbuseFilter/76)

I don’t know whether there were issues with the filter in the past as well, but yesterday I discovered that in recent months editing from those IP ranges (for logged-out users) was almost impossible.
The reason is that the AbuseFilter detected every edit as “removed_links”, even an empty edit that didn’t change anything in the article’s source code. (And according to the logs there were also filter activations during autocreateaccount, I didn't delve into it to see what happened with that.)

Maybe there is a problem in the way we used removed_links. Anyway, there is another problem which prevented editing:
After the warning of the AbuseFilter was presented, a message was presented that a temporary account was created in the last 10 minutes from the same IP address, and therefore the edit cannot be performed. However, for an unclear reason if I switch to another tab on Wikipedia I do not see that I am logged into a temporary account so it's unclear whether this account was at all created. When I wait for 10 minutes, I save the edit again, receive the AbuseFilter warning and the situation repeats itself...

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Regarding the point on the temporary account, the logs for filter 76 show that a temporary accounts are created for the user.

If then the user isn't logged in as that temporary account, we probably need to fix that.

@Dreamy_Jazz, I haven’t tested this with other filters, but if the issue where you cannot save the edit after a warning always occurs, this is a problem that requires an immediate fix.

@Dreamy_Jazz, I haven’t tested this with other filters, but if the issue where you cannot save the edit after a warning always occurs, this is a problem that requires an immediate fix.

cc @Tchanders @Niharika

Unlike count in PHP, count in AbuseFilter is a string function that counts substrings in a haystack. From the doc:

Returns the number of times the needle (first string) appears in the haystack (second string). If only one argument is given, splits it by commas and returns the number of segments.

I haven't debugged this deeply but count probably stringifies the removed_links array first and just does its usual thing. When I tested this using one of those autocreateaccount logs:

!!count(removed_links) /* true */
!!removed_links[0] /* false */
!!length(removed_links) /* false */
count('[]') === 1 /* true */

So, count(removed_links) is likely equivalent to count('[]').

And the filter you mentioned has two issues:

  1. To count the number of elements in an array, you should use length, not count.
  2. removed_links is an expensive variable. The filter really should check action === 'edit' before using it.

Unlike count in PHP, count in AbuseFilter is a string function that counts substrings in a haystack.

Actually,

private function funcCount( $args ) {
	if ( $args[0]->type === AFPData::DARRAY && count( $args ) === 1 ) {
		return new AFPData( AFPData::DINT, count( $args[0]->data ) );
	}
	if ( count( $args ) === 1 ) {
		$count = count( explode( ',', $args[0]->toString() ) );
	}

count does work like that of PHP, provided the argument is an array.
However, the variables you are dealing with are null during account creation. In which case, they are stringified to '' (empty string), split by commas using PHP's explode ([ '' ]), and the number of elements is returned (count( [ '' ] )).

So count(null) in AbuseFilter is equivalent to count( '' ) and count( [ '' ] ).

To count the number of elements in an array, you should use length, not count.

Indeed, length(null) is equivalent to length( '' ) and thus returns zero, while it also works like PHP's count for arrays (so length(null) behaves the same as length([])).

removed_links is an expensive variable. The filter really should check action === 'edit' before using it.

It should, but then it's expensive regardless. action === 'edit' just avoids these obscure casts.

neriah added a project: Temporary accounts.

@neriah Hello. Is this still a problem? Have you managed to reproduce it?

@Niharika Yes, the problem still exists.
I have now re-enabled the filter, and apart from the problem with the filter settings (which I will check shortly), after the warning from the filter, when I try to save again I get the message We are temporarily limiting logged-out editing from your location. Please create an account to edit, or try again later.

I mark the task as open - even if there is no problem with AbuseFilter, there is a serious problem with the temporary accounts.

@neriah Hello. Is this still a problem? Have you managed to reproduce it?

@Niharika Yes, the problem still exists.
I have now re-enabled the filter, and apart from the problem with the filter settings (which I will check shortly), after the warning from the filter, when I try to save again I get the message We are temporarily limiting logged-out editing from your location. Please create an account to edit, or try again later.

I mark the task as open - even if there is no problem with AbuseFilter, there is a serious problem with the temporary accounts.

Can you please update the filter as specified by T410724#11568866 and T410724#11569094

AFAICS your filter is matching all actions by temporary accounts because of the AbuseFilter rule being written in a way which matches every action. Therefore, the issues you are seeing will likely be fixed by those changes

... when I try to save again I get the message We are temporarily limiting logged-out editing from your location. Please create an account to edit, or try again later.

This is the rate limit for temporary account creation by IP address and isn't related AbuseFilter. If the rate limit for temporary account creation is too small on hewiki, I would suggest filing a separate task for that change

Hi @Dreamy_Jazz, I replaced the filter with length, and it works well.
I think it would be a good idea to add documentation somewhere stating that for removed_lines one must use length, so this situation doesn't happen again.

Regarding temporary accounts - at the moment it seems fine: after a warning I can save an edit without getting another warning. I would close the task, but maybe it's worth adding handling for such a case? So we don't end up in a situation like what happened on hewiki, where many people were blocked from making edits outside of an account?

Thank you very much for the help!

Dreamy_Jazz claimed this task.

Hi @Dreamy_Jazz, I replaced the filter with length, and it works well.

Good to hear the issue is resolved

I think it would be a good idea to add documentation somewhere stating that for removed_lines one must use length, so this situation doesn't happen again.

I've added some to the Extension:AbuseFilter/Rules format page on mediawiki.org

Regarding temporary accounts - at the moment it seems fine: after a warning I can save an edit without getting another warning. I would close the task, but maybe it's worth adding handling for such a case? So we don't end up in a situation like what happened on hewiki, where many people were blocked from making edits outside of an account?

Sure. Could this be filed as a separate task? I think it would be easier if it's tracked there, given that this was focusing on the specific filter being broken.

I'm going to be WP:BOLD and close this task again as Resolved (given that we updated the docs, so some action was taken)

@Dreamy_Jazz Thanks a lot for moderating this.

I think it would be a good idea to add documentation somewhere stating that for removed_lines one must use length, so this situation doesn't happen again.

@neriah I'm not sure I agree with this. While I understand your intent, that would tempt us to document the same thing for every array-type variable. A more scalable approach might be to expand the documentation of count when the input is a native array, AFAICT.

@neriah I'm not sure I agree with this. While I understand your intent, that would tempt us to document the same thing for every array-type variable. A more scalable approach might be to expand the documentation of count when the input is a native array, AFAICT.

Feel free to modify what I added to the rules format page. I'd support adding this to the general documentation for count

Feel free to modify what I added to the rules format page. I'd support adding this to the general documentation for count

Thank you, I've updated the documentation as in this diff. Please feel free to modify it further if you spot anything inaccurate.