Page MenuHomePhabricator

Fix issues with StopForumSpam's doUpdate function
Closed, ResolvedPublic

Description

After the work on T227454, StopForumSpam was pushed to the beta cluster today (T181217), which caused T271740. @Reedy and I troubleshot the issues for a while, and while https://gerrit.wikimedia.org/r/655499 needed to be addressed, it appears that there are still issues with doUpdate, likely around the way the WanCache is being used. This will require further troubleshooting of the code and possibly some improved logging and exception-handling within the extension (see also: T271756). The extension is currently disabled on the beta cluster (https://gerrit.wikimedia.org/r/655503).

Event Timeline

sbassett triaged this task as Medium priority.Jan 11 2021, 10:25 PM
sbassett updated the task description. (Show Details)
		// ungzip and process vendor file
		$fileDataProcessed = explode( "\n", gzdecode( $fileData ) );
		array_walk( $fileDataProcessed, function ( &$item, $key ) {
			global $wgSFSValidateIPList, $wgSFSIPThreshold;
			$ipData = str_getcsv( $item );
			if ( $wgSFSValidateIPList
				&& IPUtils::sanitizeIP( (string)$ipData[0] ) === null ) {
				$item = '';
				return;
			}
			if ( !( $ipData[1] ) && $ipData[1] < $wgSFSIPThreshold ) {
				$item = '';
				return;
			}
			$item = !( $ipData[0] ) ? $ipData[0] : '';
		} );
		return array_filter( $fileDataProcessed );

$fileDataProcessed before the array_walk looks good, after is just an array of empty strings, which then get filterd

array(168749) {
  [0] =>
  string(37) ""1.0.130.3","3","2020-10-19 13:29:50""
  [1] =>
  string(38) ""1.0.130.52","3","2020-10-29 00:50:29""
  [2] =>
  string(38) ""1.0.130.76","2","2020-11-04 13:34:37""
  [3] =>
  string(38) ""1.0.130.79","1","2020-11-19 01:09:39""
  [4] =>
  string(38) ""1.0.130.88","1","2020-11-05 03:45:59""
  [5] =>
  string(39) ""1.0.130.127","9","2020-11-08 18:06:20""
  [6] =>
  string(39) ""1.0.130.159","1","2020-10-30 10:36:30""
  [7] =>
  string(39) ""1.0.130.187","7","2020-11-10 18:15:21""
  [8] =>
  string(39) ""1.0.130.188","5","2020-10-21 01:53:19""
  [9] =>
  string(39) ""1.0.130.199","6","2020-10-25 17:32:57""
  [10] =>
  string(39) ""1.0.130.204","9","2020-11-19 13:02:35""
  [11] =>
  string(38) ""1.0.131.56","1","2020-11-20 23:40:01""
  [12] =>
  string(39) ""1.0.131.197","3","2020-11-23 09:55:39""
  [13] =>
  string(41) ""1.0.132.165","105","2021-01-11 18:59:24""
  [14] =>
  string(39) ""1.0.132.249","2","2020-11-09 14:16:48""
  [15] =>
  string(40) ""1.0.133.84","127","2021-01-09 08:23:27""
  [16] =>
  string(38) ""1.0.133.89","8","2021-01-08 08:40:25""
  [17] =>
  string(40) ""1.0.133.100","89","2021-01-10 12:56:09""
  [18] =>
  string(37) ""1.0.134.3","3","2020-11-19 00:10:31""
  [19] =>
  string(37) ""1.0.134.6","1","2020-11-13 16:48:52""
  [20] =>
  string(37) ""1.0.134.8","8","2021-01-11 20:36:50""
  [21] =>
  string(38) ""1.0.134.13","1","2020-12-05 07:10:17""
  [22] =>
  string(38) ""1.0.134.29","5","2021-01-04 21:21:13""
  [23] =>
  string(38) ""1.0.134.30","1","2020-12-07 09:51:09""
  [24] =>
  string(38) ""1.0.134.35","3","2020-11-29 12:06:30""
  [25] =>
  string(38) ""1.0.134.49","2","2020-12-03 12:19:16""
  [26] =>
  string(38) ""1.0.134.57","1","2020-12-24 11:35:55""
  [27] =>
  string(38) ""1.0.134.62","4","2020-12-06 15:36:40""
  [28] =>
  string(38) ""1.0.134.63","2","2020-11-21 12:23:47""
  [29] =>
  string(38) ""1.0.134.72","1","2020-11-15 12:49:25""
  [30] =>
  string(38) ""1.0.134.80","3","2020-11-21 12:21:45""
  [31] =>
  string(38) ""1.0.134.83","1","2020-12-22 14:11:47""
  [32] =>
  string(39) ""1.0.134.100","5","2020-12-07 15:24:24""
  [33] =>
  string(39) ""1.0.134.105","1","2020-11-16 11:22:07""
  [34] =>
  string(39) ""1.0.134.112","6","2020-12-05 17:24:47""
  [35] =>
  string(39) ""1.0.134.113","3","2020-12-07 18:36:22""
  [36] =>
  string(39) ""1.0.134.114","1","2020-11-17 06:02:46""
  [37] =>
  string(39) ""1.0.134.115","5","2020-10-15 16:49:03""
  [38] =>
  string(39) ""1.0.134.120","6","2020-12-04 14:04:31""
  [39] =>
  string(39) ""1.0.134.123","2","2020-11-20 12:21:15""
  [40] =>
  string(39) ""1.0.134.124","3","2020-12-08 04:21:42""
  [41] =>
  string(39) ""1.0.134.147","5","2020-12-06 20:07:11""
  [42] =>
  string(39) ""1.0.134.150","1","2020-12-03 13:10:22""
  [43] =>
  string(39) ""1.0.134.155","2","2020-12-04 16:57:13""
  [44] =>
  string(40) ""1.0.134.168","10","2020-11-17 07:03:22""
  [45] =>
  string(39) ""1.0.134.182","1","2020-11-26 16:47:13""
  [46] =>
  string(39) ""1.0.134.200","1","2020-11-17 02:30:46""
  [47] =>
  string(39) ""1.0.134.206","1","2020-11-18 00:21:51""
  [48] =>
  string(39) ""1.0.134.207","3","2021-01-11 03:33:33""
  [49] =>
  string(39) ""1.0.134.208","1","2020-11-20 06:09:34""
  [50] =>
  string(39) ""1.0.134.216","3","2020-12-06 08:02:21""
  [51] =>
  string(39) ""1.0.134.217","2","2020-11-23 12:52:08""
  [52] =>
  string(39) ""1.0.134.223","3","2020-12-02 12:52:19""
  [53] =>
  string(39) ""1.0.134.230","2","2020-11-14 09:54:49""
  [54] =>
  string(39) ""1.0.134.233","1","2020-11-16 01:48:12""
  [55] =>
  string(39) ""1.0.134.235","9","2020-11-22 02:01:39""
  [56] =>
  string(39) ""1.0.134.239","1","2020-12-04 23:20:05""
  [57] =>
  string(37) ""1.0.135.2","1","2020-12-04 16:45:14""
  [58] =>
  string(38) ""1.0.135.32","6","2020-10-16 21:16:38""
  [59] =>
  string(38) ""1.0.135.62","1","2020-10-29 15:11:35""
  [60] =>
  string(38) ""1.0.135.94","2","2020-10-13 07:35:36""
  [61] =>
  string(39) ""1.0.135.153","1","2020-10-21 11:46:24""
  [62] =>
  string(39) ""1.0.135.177","1","2020-10-27 12:13:10""
  [63] =>
  string(39) ""1.0.135.179","3","2020-11-02 09:44:43""
  [64] =>
  string(39) ""1.0.135.183","6","2020-10-23 19:54:17""
  [65] =>
  string(39) ""1.0.135.229","1","2020-10-17 04:05:36""
  [66] =>
  string(37) ""1.0.136.8","1","2020-10-22 02:00:20""
  [67] =>
  string(38) ""1.0.136.16","5","2020-11-05 22:28:33""
  [68] =>
  string(39) ""1.0.136.19","10","2020-11-20 02:08:31""
  [69] =>
  string(38) ""1.0.136.51","8","2020-11-16 13:25:32""
  [70] =>
  string(39) ""1.0.136.133","1","2020-10-15 00:48:31""
  [71] =>
  string(39) ""1.0.136.138","1","2020-11-11 23:25:21""
  [72] =>
  string(39) ""1.0.136.155","2","2020-10-13 12:35:22""
  [73] =>
  string(39) ""1.0.136.176","2","2020-11-06 08:05:58""
  [74] =>
  string(39) ""1.0.136.183","1","2020-12-28 05:25:31""
  [75] =>
  string(39) ""1.0.136.208","1","2020-11-08 20:55:02""
  [76] =>
  string(39) ""1.0.140.171","3","2020-10-30 06:05:00""
  [77] =>
  string(38) ""1.0.141.90","1","2020-11-01 19:47:37""
  [78] =>
  string(38) ""1.0.142.16","3","2020-11-16 11:42:52""
  [79] =>
  string(38) ""1.0.142.20","4","2020-11-24 08:14:45""
  [80] =>
  string(38) ""1.0.142.39","2","2020-11-11 14:34:35""
  [81] =>
  string(38) ""1.0.142.69","7","2020-11-16 10:58:58""
  [82] =>
  string(39) ""1.0.142.99","13","2020-11-13 16:56:16""
  [83] =>
  string(39) ""1.0.142.104","1","2020-11-20 05:26:39""
  [84] =>
  string(39) ""1.0.142.105","4","2020-11-23 16:40:08""
  [85] =>
  string(39) ""1.0.142.180","1","2020-11-18 04:46:04""
  [86] =>
  string(39) ""1.0.142.234","5","2020-11-14 14:53:17""
  [87] =>
  string(38) ""1.0.144.28","2","2020-11-20 17:34:32""
  [88] =>
  string(38) ""1.0.144.45","7","2020-12-30 11:55:12""
  [89] =>
  string(38) ""1.0.144.52","4","2020-11-09 15:49:56""
  [90] =>
  string(39) ""1.0.144.106","4","2020-12-15 10:04:53""
  [91] =>
  string(39) ""1.0.144.212","3","2020-11-03 18:05:17""
  [92] =>
  string(38) ""1.0.145.89","2","2020-11-21 01:21:23""
  [93] =>
  string(39) ""1.0.145.191","1","2020-11-30 16:19:21""
  [94] =>
  string(39) ""1.0.145.224","1","2020-10-13 10:11:38""
  [95] =>
  string(37) ""1.0.146.3","1","2020-12-06 06:15:11""
  [96] =>
  string(38) ""1.0.146.63","1","2020-11-03 07:58:29""
  [97] =>
  string(39) ""1.0.146.150","2","2020-12-18 21:45:07""
  [98] =>
  string(38) ""1.0.147.10","3","2020-12-21 16:40:04""
  [99] =>
  string(38) ""1.0.147.35","3","2020-10-20 06:21:09""
  [100] =>
  string(39) ""1.0.147.134","1","2020-10-18 07:52:58""
  [101] =>
  string(38) ""1.0.148.37","7","2020-11-26 02:42:09""
  [102] =>
  string(38) ""1.0.148.79","2","2020-11-09 15:57:44""
  [103] =>
  string(39) ""1.0.148.98","11","2020-11-18 15:35:57""
  [104] =>
  string(39) ""1.0.148.159","6","2020-12-21 16:20:26""
  [105] =>
  string(39) ""1.0.148.241","3","2021-01-01 19:26:20""
  [106] =>
  string(39) ""1.0.148.243","2","2020-11-24 05:41:41""
  [107] =>
  string(39) ""1.0.149.157","3","2021-01-09 19:35:39""
  [108] =>
  string(39) ""1.0.149.174","1","2020-12-17 03:03:54""
  [109] =>
  string(38) ""1.0.150.33","5","2020-12-29 15:42:34""
  [110] =>
  string(39) ""1.0.150.110","3","2020-11-24 17:59:33""
  [111] =>
  string(39) ""1.0.150.164","1","2021-01-02 05:13:40""
  [112] =>
  string(38) ""1.0.151.28","7","2021-01-04 07:48:15""
  [113] =>
  string(39) ""1.0.151.193","2","2020-11-24 00:01:05""
  [114] =>
  string(38) ""1.0.152.57","3","2020-12-01 17:34:52""
  [115] =>
  string(38) ""1.0.152.81","1","2021-01-10 20:19:59""
  [116] =>
  string(39) ""1.0.152.180","9","2020-11-21 11:48:13""
  [117] =>
  string(38) ""1.0.154.14","2","2021-01-11 14:06:26""
  [118] =>
  string(38) ""1.0.154.88","1","2020-12-03 19:05:05""
  [119] =>
  string(39) ""1.0.154.141","3","2020-10-23 21:28:51""
  [120] =>
  string(39) ""1.0.154.211","7","2020-11-15 07:44:34""
  [121] =>
  string(39) ""1.0.154.212","6","2021-01-05 21:39:10""
  [122] =>
  string(39) ""1.0.154.241","1","2020-11-24 08:33:29""
  [123] =>
  string(38) ""1.0.155.82","2","2021-01-09 13:29:21""
  [124] =>
  string(39) ""1.0.155.160","1","2020-12-28 04:32:46""
  [125] =>
  string(39) ""1.0.155.227","5","2020-12-17 20:27:35""
  [126] =>
  string(40) ""1.0.155.254","14","2020-12-09 05:29:05""
  [127] =>
  string(38) ""1.0.156.12","2","2020-11-18 02:10:23""

  (more elements)...
}
/vagrant/mediawiki/extensions/StopForumSpam/includes/DenyListUpdate.php:228:
array(168749) {
  [0] =>
  string(0) ""
  [1] =>
  string(0) ""
  [2] =>
  string(0) ""
  [3] =>
  string(0) ""
  [4] =>
  string(0) ""
  [5] =>
  string(0) ""
  [6] =>
  string(0) ""
  [7] =>
  string(0) ""
  [8] =>
  string(0) ""
  [9] =>
  string(0) ""
  [10] =>
  string(0) ""
  [11] =>
  string(0) ""
  [12] =>
  string(0) ""
  [13] =>
  string(0) ""
  [14] =>
  string(0) ""
  [15] =>
  string(0) ""
  [16] =>
  string(0) ""
  [17] =>
  string(0) ""
  [18] =>
  string(0) ""
  [19] =>
  string(0) ""
  [20] =>
  string(0) ""
  [21] =>
  string(0) ""
  [22] =>
  string(0) ""
  [23] =>
  string(0) ""
  [24] =>
  string(0) ""
  [25] =>
  string(0) ""
  [26] =>
  string(0) ""
  [27] =>
  string(0) ""
  [28] =>
  string(0) ""
  [29] =>
  string(0) ""
  [30] =>
  string(0) ""
  [31] =>
  string(0) ""
  [32] =>
  string(0) ""
  [33] =>
  string(0) ""
  [34] =>
  string(0) ""
  [35] =>
  string(0) ""
  [36] =>
  string(0) ""
  [37] =>
  string(0) ""
  [38] =>
  string(0) ""
  [39] =>
  string(0) ""
  [40] =>
  string(0) ""
  [41] =>
  string(0) ""
  [42] =>
  string(0) ""
  [43] =>
  string(0) ""
  [44] =>
  string(0) ""
  [45] =>
  string(0) ""
  [46] =>
  string(0) ""
  [47] =>
  string(0) ""
  [48] =>
  string(0) ""
  [49] =>
  string(0) ""
  [50] =>
  string(0) ""
  [51] =>
  string(0) ""
  [52] =>
  string(0) ""
  [53] =>
  string(0) ""
  [54] =>
  string(0) ""
  [55] =>
  string(0) ""
  [56] =>
  string(0) ""
  [57] =>
  string(0) ""
  [58] =>
  string(0) ""
  [59] =>
  string(0) ""
  [60] =>
  string(0) ""
  [61] =>
  string(0) ""
  [62] =>
  string(0) ""
  [63] =>
  string(0) ""
  [64] =>
  string(0) ""
  [65] =>
  string(0) ""
  [66] =>
  string(0) ""
  [67] =>
  string(0) ""
  [68] =>
  string(0) ""
  [69] =>
  string(0) ""
  [70] =>
  string(0) ""
  [71] =>
  string(0) ""
  [72] =>
  string(0) ""
  [73] =>
  string(0) ""
  [74] =>
  string(0) ""
  [75] =>
  string(0) ""
  [76] =>
  string(0) ""
  [77] =>
  string(0) ""
  [78] =>
  string(0) ""
  [79] =>
  string(0) ""
  [80] =>
  string(0) ""
  [81] =>
  string(0) ""
  [82] =>
  string(0) ""
  [83] =>
  string(0) ""
  [84] =>
  string(0) ""
  [85] =>
  string(0) ""
  [86] =>
  string(0) ""
  [87] =>
  string(0) ""
  [88] =>
  string(0) ""
  [89] =>
  string(0) ""
  [90] =>
  string(0) ""
  [91] =>
  string(0) ""
  [92] =>
  string(0) ""
  [93] =>
  string(0) ""
  [94] =>
  string(0) ""
  [95] =>
  string(0) ""
  [96] =>
  string(0) ""
  [97] =>
  string(0) ""
  [98] =>
  string(0) ""
  [99] =>
  string(0) ""
  [100] =>
  string(0) ""
  [101] =>
  string(0) ""
  [102] =>
  string(0) ""
  [103] =>
  string(0) ""
  [104] =>
  string(0) ""
  [105] =>
  string(0) ""
  [106] =>
  string(0) ""
  [107] =>
  string(0) ""
  [108] =>
  string(0) ""
  [109] =>
  string(0) ""
  [110] =>
  string(0) ""
  [111] =>
  string(0) ""
  [112] =>
  string(0) ""
  [113] =>
  string(0) ""
  [114] =>
  string(0) ""
  [115] =>
  string(0) ""
  [116] =>
  string(0) ""
  [117] =>
  string(0) ""
  [118] =>
  string(0) ""
  [119] =>
  string(0) ""
  [120] =>
  string(0) ""
  [121] =>
  string(0) ""
  [122] =>
  string(0) ""
  [123] =>
  string(0) ""
  [124] =>
  string(0) ""
  [125] =>
  string(0) ""
  [126] =>
  string(0) ""
  [127] =>
  string(0) ""

  (more elements)...
}

Change 655526 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/extensions/StopForumSpam@master] Fix csv parsing

https://gerrit.wikimedia.org/r/655526

Some other stuff still going on...

With the patch above, and the caching stuff basically comment out, I get success. Put the caching back in, it fails.... Progress anyway

@Reedy - Thanks for having a look. My next best guess for the culprit is that maybe some of the WanCache gets should instead be getWithSetCallback, particularly the example that Timo called out during CR. I didn't think the way I had done it wouldn't even work in beta/production, as it worked fine locally, but maybe it doesn't.

Change 655526 merged by jenkins-bot:
[mediawiki/extensions/StopForumSpam@master] DenyListUpdate: Cast values from CSV input before using them

https://gerrit.wikimedia.org/r/655526

vagrant@mediawiki-vagrant:/vagrant/mediawiki/extensions/StopForumSpam$ php maintenance/updateDenyList.php 
Starting update of SFS deny list in cache...
Done! Loaded 49278 IPs.}
Took 8.4193689823151 seconds
vagrant@mediawiki-vagrant:/vagrant/mediawiki/extensions/StopForumSpam$ git diff
diff --git a/includes/DenyListUpdate.php b/includes/DenyListUpdate.php
index 5865f26..8aac20e 100644
--- a/includes/DenyListUpdate.php
+++ b/includes/DenyListUpdate.php
@@ -75,23 +75,23 @@ class DenyListUpdate implements DeferrableUpdate {
         */
        public static function loadDenyListIPs() {
                global $wgSFSDenyListCacheDuration, $wgSFSDenyListKey;
-               $wanCache = MediaWikiServices::getInstance()->getMainWANObjectCache();
+/*             $wanCache = MediaWikiServices::getInstance()->getMainWANObjectCache();
                return $wanCache->getWithSetCallback(
                        $wanCache->makeGlobalKey( $wgSFSDenyListKey ),
                        $wgSFSDenyListCacheDuration,
                        function () {
-                               global $wgSFSIPListLocation;
+*/                             global $wgSFSIPListLocation;
                                $IPs = is_file( $wgSFSIPListLocation ) ?
                                        self::fetchDenyListIPsLocal() : self::fetchDenyListIPsRemote();
                                return $IPs;
-                       },
+/*                     },
                        [
                                'lockTSE' => $wgSFSDenyListCacheDuration,
                                'staleTTL' => $wgSFSDenyListCacheDuration,
                                'busyValue' => []
                        ]
                );
-       }
+*/     }
 
        /**
         * Fetch gunzipped/unzipped SFS deny list from local file
vagrant@mediawiki-vagrant:/vagrant/mediawiki/extensions/StopForumSpam$ git checkout includes/DenyListUpdate.php
vagrant@mediawiki-vagrant:/vagrant/mediawiki/extensions/StopForumSpam$ php maintenance/updateDenyList.php 
Starting update of SFS deny list in cache...
Failed!

vagrant@mediawiki-vagrant:/vagrant/mediawiki/extensions/StopForumSpam$

The cache key being 'sfs-denylist-set' rather than "sfs:denylist:set" seems to solve the problem

Change 655616 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/extensions/StopForumSpam@master] Change $wgSFSDenyListKey from : to -

https://gerrit.wikimedia.org/r/655616

Does this really need to be a config variable? Or can we just turn it into a const/similar elsewhere?

AFAIK, other extensions don't usually make this "user configurable"

Does this really need to be a config variable? Or can we just turn it into a const/similar elsewhere?

No, I was just trying to abstract it a bit and reduce magic numbers in the code. It was previously encapsulated within a function in the old code. Also discussed here during the CR. I'm happy to revert that change by re-introducing DenyListManager::getDenyListKey() if that would be best. The key name in the test needs updating as well.

Change 655616 merged by jenkins-bot:
[mediawiki/extensions/StopForumSpam@master] Fix StopForumSpam WanCache issues

https://gerrit.wikimedia.org/r/655616

sbassett moved this task from In Progress to Done on the MediaWiki-extensions-StopForumSpam board.
sbassett moved this task from In Progress to Done on the user-sbassett board.