Measure AbuseFilter runtime
Closed, ResolvedPublic5 Story Points

Description

The AbuseFilter extension has a limitation on the amount of conditions all combined active filters can utilize on every publish event. The current threshold for ENWP is 1,000 conditions. These conditions are in place for performance reasons. These limitations were put in place years ago and we may be able to safely raise this threshold with little or no actual performance problems.

Questions

  • How much of a performance impact does AbuseFilter currently cause on ENWP?
  • How much of a performance impact does AbuseFilter currently cause on other communities?
  • To what degree would increasing the condition count (e.g. to 2,000) result in a slower publish process for the end user?

To accomplish this, we will measure the runtime of the AbuseFilter feature on all wikis via stats-d. We will not measure the runtime of individual filters. We will measure:

  • Runtime
  • Number of conditions executed
  • Number of filters executed
  • Number of times a filter becomes disabled (stretch goal)

Hyperlinks

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 21 2017, 11:15 PM

@MusikAnimal @kaldari — Please review this ticket and please edit as you see fit!

Huji added a subscriber: Huji.Mar 22 2017, 11:13 PM
DannyH triaged this task as Normal priority.Apr 4 2017, 11:30 PM
kaldari set the point value for this task to 8.Apr 4 2017, 11:51 PM
kaldari moved this task from Sprint planning/estimation to Backlog on the Community-Tech board.
TBolliger updated the task description. (Show Details)May 9 2017, 11:39 PM
Zache added a subscriber: Zache.May 12 2017, 1:12 PM

Quick update: After a brief, discussion, we decided that 'AbuseFilter performance' actually means two things:

  1. The actual performance impact about AF on production wikis.
    • Runtime is measured per-filter in core AbuseFilter but it has been disabled for ENWP — T101648
    • I will reach out to @aaron & @He7d3r to discuss what we can do to fix the (ironically) sluggish performance measurement.
  2. A reproducible testing framework which is consistent for testing our code changes.
    • We may use selenium (or the like) to create an environment for testing changes we make to AbuseFilter, including new filters that we may want to introduce
    • Depending on what Aaron and Helder share, we may be able to use Arc Lamp.
dbarratt claimed this task.Jun 5 2017, 5:06 PM
dbarratt removed dbarratt as the assignee of this task.Jun 5 2017, 6:03 PM
dbarratt added a subscriber: dbarratt.
DannyH moved this task from Backlog to Anti-Harassment Tools on the Community-Tech board.

During a meeting with Max, Leon, Dayllan, David, Caroline and Trevor:

  • At this stage, we don't need to measure each filter's performance, just the entire runtime.
  • Use stats-d
  • Depending on how much data we're putting in, we may need to talk to Analytics team to understand if their infrastructure can handle this.
  • We'll measure on all wikis with AF enabled, we'll need to split the data by wiki

Things to measure:

  • Runtime
  • Number of conditions executed
  • Number of filters executed
  • Number of times a filter becomes disabled (rare, so stretch goal)
TBolliger renamed this task from Investigate AbuseFilter performance to Measure AbuseFilter runtime.Jul 14 2017, 9:35 PM
TBolliger updated the task description. (Show Details)
TBolliger changed the point value for this task from 8 to 5.Aug 17 2017, 7:16 PM

Also: meet or talk with Analytics team to make sure they are prepared to store this data

dmaza added a subscriber: dmaza.Aug 18 2017, 8:30 PM

@kaldari What team or who can I speak to about stats-d? Mostly to ask if they will have the capacity for this and on getting some help to get it running locally

kaldari added a comment.EditedAug 18 2017, 8:37 PM

@dmaza: You'll want to ask the Performance team, probably @aaron. See also https://wikitech.wikimedia.org/wiki/Statsd. Also would be good to ask @MaxSem about it.

kaldari added a subscriber: MaxSem.Aug 18 2017, 8:39 PM
dbarratt assigned this task to dmaza.Mon, Aug 21, 7:30 PM
dmaza added a comment.Tue, Aug 22, 2:49 PM

I have a question,

Runtime

I assume we only want the filter matching runtime. Do we care about the time it takes to create the abuselog or to execute the actions as part of it?

Also, Do we want a config flag to enable/disable this?

I assume we only want the filter matching runtime. Do we care about the time it takes to create the abuselog or to execute the actions as part of it?

On a per-filter basis, the "matching" time sounds good, but for this task I was hoping to get total run time, start to finish, for an edit after it goes through all filters on a wiki. In other words, the period of time the user has to wait before their edit is saved. This may or may not include things like writing to the abuselog, not sure.

Also, Do we want a config flag to enable/disable this?

I think that makes sense, yes. It's good to be able to turn it off via a SWAT deploy should things go haywire.

dmaza added a comment.Tue, Aug 22, 4:56 PM

@MusikAnimal

I was hoping to get total run time, start to finish, for an edit after it goes through all filters on a wiki. In other words, the period of time the user has to wait before their edit is saved. This may or may not include things like writing to the abuselog, not sure.

That's what I'm doing. My question was if I should include the time it takes to write to abuselog as well as the actions triggered.

I think we should.

Change 373561 had a related patch set uploaded (by Dmaza; owner: Dmaza):
[mediawiki/extensions/AbuseFilter@master] Add runtime metrics to statsd

https://gerrit.wikimedia.org/r/373561

Change 373561 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add runtime metrics to statsd

https://gerrit.wikimedia.org/r/373561

dbarratt closed this task as Resolved.Wed, Aug 30, 3:01 PM

Change 375072 had a related patch set uploaded (by Dmaza; owner: Dmaza):
[operations/mediawiki-config@master] Enable AbuseFilter runtime profile

https://gerrit.wikimedia.org/r/375072

dmaza added a comment.Thu, Aug 31, 8:39 PM

Initially, the following wikis will be used to start measuring performace:

'commonswiki',
'enwiki', 
'mediawikiwiki', 
'metawiki, 
'testwiki', 
'wikidatawiki'

Change 375072 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilter runtime profile

https://gerrit.wikimedia.org/r/375072

Mentioned in SAL (#wikimedia-operations) [2017-09-05T23:39:49Z] <reedy@tin> Synchronized wmf-config/InitialiseSettings.php: Enable AbuseFilter runtime profile T161059 (duration: 00m 49s)