Page MenuHomePhabricator

Performance review of ext:StopForumSpam
Closed, ResolvedPublic

Description

Description

I have been working on a refactor of the StopForumSpam extension, with the goal of eventually deploying it to beta (T181217) and then some set of smaller projects which suffer from an onslaught of spam. Once the primary change set for this work has been merged, the plan is to deploy the extension to beta sometime prior to the end of Q2 (December 31st, 2020).

Preview environment

If the changes cannot be hosted on Beta Cluster, explain why and provide links to an alternate public environment instead where the Performance Team can also SSH into. Links to code only is insufficient for a performance review.

The extension is not yet within has been deployed to the beta cluster , but should be deployed there prior to the end of Q2 (December 31st, 2020). I have created this review task a bit early with the hope that it can be placed upon the Performance-Team 's docket for Q3. Further, the extension doesn't feature much of a front-end and is fairly trivial to build out and test within a local MediaWiki installation.

Which code to review

The main refactor change set is here and very close to being has been merged: https://gerrit.wikimedia.org/r/630298 (n.b. Timo had a perf-related comment here)

Performance assessment

Please initiate the performance assessment by answering the below:

  • What work has been done to ensure the best possible performance of the feature?
    • Basic coding best practices and guidelines for MediaWiki extension development.
  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?
    • The extension relies heavily upon MediaWiki's WANObjectCache for storage and retrieval of various IPv4 and IPv6 addresses, similar to TorBlock.
    • The DenyListUpdate class now supports fetching remote files from stopforumspam.com via MediaWiki's HttpRequestFactory.
    • The primary way the extension triggers a block on a write-based action is with the onGetUserPermissionsErrorsExpensive hook.
    • The updateDenyList.php maintenance script could be a DoS and/or data corruption vector if it was run concurrently at high frequency.
  • Are there potential optimisations that haven't been performed yet?
    • Not that I am aware of.
  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.
    • No extensive performance-testing has been completed thus far.

Event Timeline

sbassett renamed this task from Performance review of <Insert name of feature/service> to Performance review of ext:StopForumSpam.Oct 30 2020, 8:12 PM
sbassett added a project: user-sbassett.
sbassett moved this task from Backlog to Waiting on the user-sbassett board.
Gilles changed the task status from Open to Stalled.Nov 2 2020, 12:38 PM
Gilles claimed this task.

All good for Q2 or Q3 review of this. Set the status back to Open once there is something available for us to review, thanks!

Gilles triaged this task as Medium priority.Nov 2 2020, 12:38 PM
Gilles moved this task from Inbox to Next: Goal-oriented on the Performance-Team board.

Update: The extension went to beta on 2020-01-11, but had a few bugs. We're working through those issues now (namely T271756 and T271757) and I'm optimistic the extension will be re-enabled within a week or so.

Thanks for the update, let me know when it's back on Beta.

@Gilles - @Reedy and I just worked through a few of the SFS bugs and re-enabled the extension on beta. Seems to be working fine now - should have some report-only data in logstash-beta within a day or so, I'd guess.

Hello @Gilles and @aaron - I just wanted to see if there was an updated estimated completion date for this review. I wouldn't say that this review is urgent, but if it might not be completed this quarter or next, then I'll want to plan around that information. Thanks.

About how large is the IP list that will be stored in cache?

Change 670987 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/StopForumSpam@master] Move code from DenyListUpdate to DenyListManager and clarify logic

https://gerrit.wikimedia.org/r/670987

About how large is the IP list that will be stored in cache?

The default SFS file for the extension is the 90-day IPv4 and IPv6 combined summary file. This is also the one we'd likely use in production as it strikes a happy medium for all of the available summary files. It tends to average around 7.7 Mb of data, uncompressed,.

Also - thanks for the cleanup/refactor patch above.

While we gzip memcached values, it would still be larger than the 1 mb limit. Even if I let WANCache use BagOStuff::WRITE_SEGMENTABLE, that is still a lot of I/O (even with "pcTTL" enabled).

It looks like the extension code only stores the IPs (not the other CSV fields), so the size in cache for isted_ip_90_ipv46_all.gz would be comparable to isted_ip_90_ipv46.gz if stored as a flat string (~640KB as of today). I don't see many IPv6 entries, but that will probably grow in the future.

Currently the code stores arrays, which will end up as fairly bloated serialized PHP. Maybe the smallest format would be to pack the ipv4 ips as 4 byte ints and the ipv6 ones as four 4 byte ints, all in a flat string. I serialized array of (v4 => packed IP list, v6 => packed IP list), once gzipped is 136K. key to packed That said, even just storing a string (which WanCache/BagOStuff serializes, adding a small number of bytes) of the list is 717K, and goes down to 176K after gzip, compared to a serialized/gzipped array at 333K. A flat gzipped string of hex-form IPs is 157K, and much simpler than the pack/unpack tricks.

@aaron - makes sense, thanks for the thorough analysis here and the additional performance refactor (admittedly, I don't have expansive knowledge of the intricacies of and optimizations for the wancache.) I'll try to get that tested locally soon and merge the change set, unless you'd like to leave it open a bit longer for further review.

I rebased the patch above. Once this is merged, I can consider this task closed.

I rebased the patch above. Once this is merged, I can consider this task closed.

Sounds good. Thanks for all of your work on this. I think I'd like to test the ext with this change set locally a little bit more and then @Krinkle had a couple of comments on the change set. Once those two issues are addressed, I'm happy to merge the change set.

aaron removed aaron as the assignee of this task.May 5 2021, 10:35 PM
aaron added a subscriber: aaron.

Unassigned, unless there is a clear maintainer to review that patch and do any future upkseep.

Change 670987 merged by jenkins-bot:

[mediawiki/extensions/StopForumSpam@master] Move code from DenyListUpdate to DenyListManager and clarify logic

https://gerrit.wikimedia.org/r/670987

sbassett assigned this task to aaron.