Page MenuHomePhabricator

Move AbuseFilter slow filters data from Logstash to per-filter profiling
Open, Needs TriagePublic

Description

If we implement T177017: Re-enable per-filter profiling on wikis where it was disabled (currently blocked by T179323) we should move the work from T174205 which is currently logging to LogStash to the per-filter profiles.

Event Timeline

Daimona changed the task status from Stalled to Open.Apr 4 2018, 12:28 PM

I think the time has come to think about this, and I wonder how we should display the data. Maybe in the per-filter stats we whould add a line like "xxx actions took too much time"? Together with adding a log somewhere? This thing is specific for edit AND for filter, so I can't easily spot the right place to show it.

Change 429381 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Move slow filters data to dedicated log

https://gerrit.wikimedia.org/r/429381

In order to have a separate log for slow runs I think we have to add a table to store var_dumps. As you can see from the scratch it's not a big table: it'll only have 2-3 columns (afs_time is currently unused) and won't have many rows, but I think we still need an approval.

To be honest, the log implementation isn't good. First, it uses two tables to store stuff. But, most important, we don't have handy ways to access logs. For instance, we want filter slow runs for each AbuseFilter. This would require tons of queries to logging table, which isn't good. What we need is a place where we can store log entries, which in turn would have a unique ID with a link to /examine, and a place to save the amount of slow runs for each filter; for instance, every filter's hit count is stored in af_hit_count on abuse_filter table, and we basically need something like that. I'll try to improve the code in the next days, but I'd be glad to hear suggestions about this.

The table is now prettier, and so is data storing. However, there's a doubt for which I'd like to hear some thoughts. In https://gerrit.wikimedia.org/r/#/c/429731/ I'm adding to the log the actions which hit the conditions limit, so that in this case the log will show

User Foo reached the limit of xxx conditions with action "edit" on Bar

And to display the conditions limit ("xxx" in the sample above) I take it from the global variable AbuseFilterConditionLimit. However, this means that if such variable is changed, old logs will display the new number. The solution would be to store the conditions limit in the database like we do for errors and runtime, but this means adding a column that is highly unlikely to have different values, since AFAICS in almost 10 years the conds limit was changed only once for commonswiki. The question is: should I add a (likely almost-unuseful) column to store it for every entry, or just retrieve when showing log and peace to everyone?

jcrespo subscribed.

Sorry, DBA was added, but I don't see clear actionables for us. We are happy to help with all database-related tasks, but we need 1) context (normally a problem statement or a task #) 2) explicit question or wanted help you need from us. We are extremely busy, and any help you can give us to streamline our workflow will also benefit you too, by us being able to respond quicker :-)

If this is not a task you explicitly want us to do anything, just for us to be aware of it, you can add us into the "blocked-external/not-db-team" column as I did here and we will still have a quick look.

@jcrespo Thanks, and sorry :-) This patch will need approval since it adds a new table, but it's not ready for review, since this task is blocked by several others. As this will be ready for review I'll reach up to you again and provide the exact context.

@Daimona Absolutely no problem, I just wanted to set expectations clear that we were not actively working on this (sometimes misunderstandings happen where people think they are blocked on us, but we didn't know that). When you are ready for us to help, add us as reviewers and leaves us here a note explicitly asking for help with what you want us to do. :-)

Change 489186 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add the abuse_filter_stats table

https://gerrit.wikimedia.org/r/489186

Change 489186 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add the abuse_filter_stats table

https://gerrit.wikimedia.org/r/489186

Change 429381 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Move slow filters data to dedicated log

https://gerrit.wikimedia.org/r/429381

Removing task assignee due to inactivity, as this open task has been assigned to the same person for more than two years (see the emails sent to the task assignee on Oct27 and Nov23). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.
(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

LSobanski subscribed.

Removing the DBA tag and subscribing myself instead. Once there are specific actions for DBA please re-add us and/or @mention me.