Remove average run time and consumed conditions to make editing faster
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	He7d3r
	Jun 7 2015, 4:03 PM

Description

Currently, each filter has statistics like this

On average, its run time is $4 ms, and it consumes $5 conditions of the condition limit."

This is very slow as there can easily be hundreds of filters, each doing 6 memcached queries. Xenon flamegraphs show a lot of time spent in this method, slowing down editing.

Related Objects

Mentioned In: T177017: Re-enable per-filter profiling on wikis where it was disabled
T161059: Measure AbuseFilter runtime
T132189: Restore display of number of conditions used by filter on the filter's page, gated by new config variable $wgAbuseFilterProfile
T102903: Support on-demand AbuseFilter profiling of all/specific filters
Mentioned Here: T132189: Restore display of number of conditions used by filter on the filter's page, gated by new config variable $wgAbuseFilterProfile
T102903: Support on-demand AbuseFilter profiling of all/specific filters
T53294: Overhaul internal profiling system

Event Timeline

He7d3r created this task.Jun 7 2015, 4:03 PM

He7d3r assigned this task to aaron.

He7d3r raised the priority of this task from to Needs Triage.

He7d3r updated the task description. (Show Details)

He7d3r added projects: AbuseFilter, User-notice, Performance Issue.

He7d3r subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 7 2015, 4:03 PM

Change 211951 merged by jenkins-bot:
Removed filter profiling using $wgMemc

https://gerrit.wikimedia.org/r/211951

I wonder how this removal will affect T53294: Overhaul internal profiling system

• gpaumier moved this task from To Triage to Announce in next Tech/News on the User-notice board.Jun 11 2015, 4:41 PM

• gpaumier moved this task from Announce in next Tech/News to In current Tech/News draft on the User-notice board.Jun 11 2015, 10:00 PM

• gpaumier moved this task from In current Tech/News draft to Recently announced in Tech/News on the User-notice board.Jun 12 2015, 7:51 PM

I wonder if this could backfire. Without information on slowness, admins might unwittingly make filters slower than they currently are. Perhaps stats should be calculated every now and then.

It will backfire.
It makes our work of debugging filters harder than it already is.

Aren't those stats broken anyway? I think it's better if those are fixed first before it's restored. There's also an open patch to improve it.

Just going to write this here for reference... On enwiki we had our friend [[User:Dragons flight]] explain best practices to reduce runtime and condition count, that as long as you follow that and combine filters when possible, I think you should be ok without the stats.

Here's the link: https://en.wikipedia.org/wiki/Wikipedia_talk:Edit_filter/Archive_6#Condition_limit_3

In T101648#1365804, @MusikAnimal wrote:

Just going to write this here for reference... On enwiki we had our friend [[User:Dragons flight]] explain best practices to reduce runtime and condition count, that as long as you follow that and combine filters when possible, I think you should be ok without the stats.

Here's the link: https://en.wikipedia.org/wiki/Wikipedia_talk:Edit_filter/Archive_6#Condition_limit_3

Thanks. Best practices should be documented in https://www.mediawiki.org/wiki/Extension:AbuseFilter subpages. :)

Aren't those stats broken anyway? I think it's better if those are fixed first before it's restored. There's also an open patch to improve it.

What patch? Is this tracked somewhere?

Ah yes, actually forgot I already added it there, it's at https://www.mediawiki.org/wiki/Extension:AbuseFilter/Conditions

In T101648#1366187, @Nemo_bis wrote:

What patch? Is this tracked somewhere?

https://gerrit.wikimedia.org/r/#/c/201104/

In T101648#1366198, @MusikAnimal wrote:

Ah yes, actually forgot I already added it there, it's at https://www.mediawiki.org/wiki/Extension:AbuseFilter/Conditions

Notice that Extension:AbuseFilter/Conditions#General counting just transcludes the table I created after carefully examining the stats which were removed.
I would not have documented that in the first place, if these statistics were not available.

In T101648#1366790, @He7d3r wrote:

In T101648#1366198, @MusikAnimal wrote:

Ah yes, actually forgot I already added it there, it's at https://www.mediawiki.org/wiki/Extension:AbuseFilter/Conditions

Notice that Extension:AbuseFilter/Conditions#General counting just transcludes the table I created after carefully examining the stats which were removed.
I would not have documented that in the first place, if these statistics were not available.

Well we've got it documented now :) And kudos for making that table! As long as we're disciplined about authoring the filters we should be fine until more perfected stats are available.

We might even consider having an on-demand statistics tool. So maybe I could check a checkbox to "track performance of this filter" and when I see that it is doing well I can remove the performance tracking. There would be some notice making it clear the performance tracking also has a payload on saving of edits, so that the authors know not to leave it on indefinitely.

We might even consider having an on-demand statistics tool. So maybe I could check a checkbox to "track performance of this filter" and when I see that it is doing well I can remove the performance tracking.

This is an interesting idea. The tracking could disable itself after a certain number of hits (or checked actions). Care to file a feature request?

The ability to have it run somehow somewhere to get an idea of the efficacy or not of your filter is useful. Just turning it off is not helpful for those of who write filters but are not developers and just follow others examples in a means to stop the crap.

aaron mentioned this in T102903: Support on-demand AbuseFilter profiling of all/specific filters.Jun 18 2015, 12:21 AM

• gpaumier moved this task from Recently announced in Tech/News to Already announced/Archive on the User-notice board.Jun 18 2015, 2:23 AM

I believe the initial version of this (now removed) was reading three entries from memcache and then setting three updated entries per filter (as well as a few global keys that were updated per edit rather than per filter).

I already had a version in gerrit that reduced this to one read and one write per filter. If memcache requests are the bottleneck, we could easily go further and reduce it to one read and one write of structured data to memcache per edit rather than per filter. Aside from reading / writing memcache, I rather doubt that any of the rest of the internal profiling steps are generating meaningful lag. Counting conditions is already required for the condition limiter, and reading the system clock to track run time is pretty trivial, so I assume that nearly all of the delay is associated with the way memcache was being used. Though if someone wants to argue otherwise I'd be curious about your evidence?

Personally, I think removing all AF profiling is a very bad idea. We already have trouble with people understanding the performance of the filters, which was not aided by the profiling being buggy. In fact, the present profiling doesn't go far enough. We've had filters deployed (and subsequently removed) that had good average run times, but which could literally produce tens of seconds of lag against certain types of edits. We need to track not just average run time but also identify filters that occasionally produce extremely slow run times. (I also had a version in gerrit that added tracking of max run time among recent edits, but that depended upon the profiling existing.) For the same reason, I think on-demand profiling is probably not a good approach if we want to catch filters that are only occasionally slow. At least some level of performance capture needs to be always on or admins are going to be poking about blindly trying to find problems.

I'd suggest that reducing the memcache hits to once per edit (rather than several per filter) and possibly sampling (e.g. only profiling every Nth edit) would eliminate most of the lag while also being more useful and effective than on demand profiling.

I could work on those things myself down the line, but I'm busy in the near term. And frankly, given that all my previous AF edits in gerrit were ignored for months I'm not feeling much enthusiasm for working on this.

I'd suggest that reducing the memcache hits to once per edit (rather than several per filter) and possibly sampling (e.g. only profiling every Nth edit) would eliminate most of the lag while also being more useful and effective than on demand profiling.

This seems very sensible. @ori and @aaron, are you open to such a "partial revert" of your flat-out removal?

@ori, @aaron: Could you answer @Nemo_bis' question?

Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJul 23 2015, 2:11 PM

T102903: Support on-demand AbuseFilter profiling of all/specific filters

In T101648#1377351, @Nemo_bis wrote:

I'd suggest that reducing the memcache hits to once per edit (rather than several per filter) and possibly sampling (e.g. only profiling every Nth edit) would eliminate most of the lag while also being more useful and effective than on demand profiling.

This seems very sensible. @ori and @aaron, are you open to such a "partial revert" of your flat-out removal?

That's basically a suggestion I made in an email as a possibility (also using DeferredUpdates). I'd still *prefer* on-demand though.

We really do need some sort of profiling, especially with some of the very complex filters.

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptJan 1 2016, 11:26 PM

Glaisher mentioned this in T132189: Restore display of number of conditions used by filter on the filter's page, gated by new config variable $wgAbuseFilterProfile.Apr 8 2016, 4:25 PM

• TBolliger mentioned this in T161059: Measure AbuseFilter runtime.May 12 2017, 9:58 PM

MusikAnimal mentioned this in T177017: Re-enable per-filter profiling on wikis where it was disabled.Sep 28 2017, 7:40 PM

matmarex unsubscribed.Apr 5 2018, 5:31 PM

Ladsgroup edited projects, added User-notice-archive; removed User-notice.Aug 13 2022, 1:53 PM

Remove average run time and consumed conditions to make editing fasterClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Remove average run time and consumed conditions to make editing faster
Closed, ResolvedPublic
Actions