Page MenuHomePhabricator

Variables constantly don't appear in abuse filter log
Closed, ResolvedPublic

Description

In abuse filter log, important variables (like added_lines, removed_lines, edit_delta) constantly don't appears, which difficult when you try fix some false positives on the filters (using examine tool).

Examples:
https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/3451739
https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/3453041
https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/3453002
https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/3452950
https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/3452921

This problems are extended to all triggers given at same time. Example: https://pt.wikipedia.org/w/index.php?title=Especial:Registro_de_abusos&wpSearchUser=191.250.169.51

All triggers of this IP (on "19 de setembro de 2017") coming of different filters shows the same problem.

Event Timeline

Silent created this task.Sep 20 2017, 1:07 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 20 2017, 1:07 AM
Silent updated the task description. (Show Details)Sep 20 2017, 1:07 AM
Silent updated the task description. (Show Details)
Silent updated the task description. (Show Details)Sep 20 2017, 1:10 AM
Silent updated the task description. (Show Details)

What does "don't appear" mean exactly? What does appear? How does it look? What is an example of what is shown, and what is an example what you would expect to be shown? (Without private info such as IP addresses.)
I don't have access to the example links you provided...

Silent added a comment.EditedSep 20 2017, 1:13 PM

Mean which don't appears in "Action parameters" table. That table in abuse filter logs page that shows the variables and the respective values.

Look this image: https://drive.google.com/open?id=0B-FYCQ6PtBzISHNzQTB2V0hnTzg
That's is the table relative to this trigger: https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/3451739

Noticed that variables "added_lines", "removed_lines" and "edit_delta" don't appears? This is the problem. Thereby isn't possible test appropriately a filter that uses theses variables in search of false postives, for example.

Nirmos added a subscriber: Nirmos.Sep 20 2017, 5:27 PM

Pretty sure this is T175933.

I have the same problem on my wiki. Most Abusefilter logs are missing the added_lines and removed_lines variables, on filters that actually matched the edit, prevented it and are using those variables to trigger the action. This makes examining log entries hard to debug, because since those variables are not present, the same filter doesn't match.

In other cases, the match was for a filter that doesn't use added_lines or removed_lines (it matched a page creation) but those variables are present...

Looking at the code, I think I see where's the problem: stashed edits.

See AbuseFilter::filterAction

When you start editing, there are asynchronous requests to stashedit. It runs the filters and populates dynamically all required lazy load variables, but only stores which filters matched the action.

When you actually save the page, the cache contains which filters matched the action, and if the edit is the same as the stashed one (no further modifications to wikicode since the stash), it gets the matched filters from the stash, but the lazy loaded variables aren't populated (because filters don't run again to populate them, and they aren't stored/retrieved to/from the cache). So we end with only the basic variables populated.

This means the problem can be reproduced sometimes: it depends if the edit was stashed before being published or not. Stashed edits don't have lazy loaded variables populated.

I've disabled JavaScript and did an edit that would match a filter. Since JavaScript is disabled, there are no stashed edits, and the AbuseFilter log did display correctly the variables used by the filter.

He7d3r added a subscriber: He7d3r.Oct 28 2017, 3:06 PM
Daimona triaged this task as High priority.Mar 23 2018, 8:33 PM

Like T175933, this is something really wrong that we should fix.

@Ciencia_Al_Poder regarding your last comment, do you think T175933 may have a similar cause as well? I'll try to take some time to thoroughly investigate this issue and try to find a solution, although I'm not so sure I'll be able to find one.

@Ciencia_Al_Poder regarding your last comment, do you think T175933 may have a similar cause as well? I'll try to take some time to thoroughly investigate this issue and try to find a solution, although I'm not so sure I'll be able to find one.

I can't say for sure, but when I examined the code, I didn't find anything suspicious that could cause some variables to not match during stash edit that would otherwise match in a normal save. You can look at the examples of filters that didn't match but should have stopped the edit, and see if there are missing variables like this task, to see if the issue happens on stashed edits or not.

Thanks for the clarification. I asked that since I thought T175933 may have its root in a flaky handling of asynchronous requests. I'm afraid I won't be able to solve this task alone, but I'd be happy to give any help.

Daimona moved this task from Backlog to Internal bugs on the AbuseFilter board.Apr 3 2018, 12:19 PM
Daimona added a comment.EditedApr 3 2018, 3:37 PM

I'm not able to reproduce this on my local wiki (suggestions are welcome), although I found several examples on itwiki. Trying to see if I can fix it anyway.

EDIT: Of course I couldn't, I didn't have caching enabled :-) Now I can clearly see and confirm what Ciencia_Al_Poder said above. Also, it's quite easy to make it send a stash request: all you have to do is click on the summary field (ref). This way you can control whether to publish the stashed version or not and this becomes easy to reproduce intentionally.

Change 423885 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Save computed variables for stashed edits

https://gerrit.wikimedia.org/r/423885

MusikAnimal renamed this task from Variables constantly don't appears in abuse filter log to Variables constantly don't appear in abuse filter log.Apr 4 2018, 4:30 PM

I have another idea which would easily solve both this task and T191430. The idea is: if the edit was stashed, then after having executed filter actions, run again checkAllFilters asynchronously (no performance impact) in order to both populate lazy variables and save profiling stats. We may need to add a parameter for this kind of function call (probably we don't need everything), but it may be of great help. Thoughts?

Change 423978 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Re-execute checkAllFilters if the edit was stashed

https://gerrit.wikimedia.org/r/423978

Daimona claimed this task.Apr 4 2018, 6:07 PM

Change 423885 abandoned by Daimona Eaytoy:
Save computed variables for stashed edits

Reason:
Per above, will restore if necessary.

https://gerrit.wikimedia.org/r/423885

Daimona moved this task from Backlog to Under review on the User-Daimona board.

Here is a recent example of this problem:
https://pt.wikipedia.org/wiki/Special:AbuseFilter/examine/log/4358048
https://pt.wikipedia.org/wiki/Tabata_Amaral?diff=53895124
Variables added_lines and removed_lines are missing, and this (or something else which I can't test due to this bug) caused a false positive for filter 105.

@He7d3r Yeah, I'm well aware of the problem, I also constantly face it when I need to examine AbuseLog entries in order to tune filters. The patch above would fix the problem, but unfortunately it's not optimal - actually, it's pretty hacky. A nicer solution will need more thought, but I'll hopefully get on it after holidays.

@He7d3r Yeah, I'm well aware of the problem, I also constantly face it when I need to examine AbuseLog entries in order to tune filters. The patch above would fix the problem, but unfortunately it's not optimal - actually, it's pretty hacky. A nicer solution will need more thought, but I'll hopefully get on it after holidays.

It's so "pretty hacky" like this? The task is open more than a year and is a problem which really difficult to debug falses positives in filters.
I think that could be patched and posteriorly make a "nicer solution".

Daimona added a comment.EditedDec 27 2018, 4:59 PM

@Silent Yes, it is. I know how nasty this problem is, but we cannot risk introducing new problems with this fix. I'll take a look again right now.

UPDATE: OK, so I think we should first add some unit tests. There are several patches under review adding more unit tests which could be really helpful here, and I'm also going to upload more. BTW, @He7d3r, this bug doesn't cause any false positive, it's just annoying because it doesn't let you examine certain entries, but variables are computed correctly when running filters.

In these two days I improved unit tests and added tests to cover this particular issue. I also determined that an alternative solution would require a bigger change and the hacky one is actually fine for now. So https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/423978/ is the temporary solution (note that it has several depends-on).

Change 423978 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Re-execute checkAllFilters if the edit was stashed

https://gerrit.wikimedia.org/r/423978

Daimona closed this task as Resolved.Jan 23 2019, 9:11 PM
Daimona removed a project: Patch-For-Review.
Daimona moved this task from Under review to Done on the User-Daimona board.

This is finally done, with a big thank you to Huji for the review. The patch introduced a minor tech debt, but it's already planned to handle it.

Endless thanks to @Daimona and @Huji ! 🎉

Huji added a comment.Jan 25 2019, 2:41 AM

Happy to be of service!