Page MenuHomePhabricator

Spurious MD5 errors ("SFS IP file contents and file md5 do not match!")
Open, Needs TriagePublic

Description

Ever since I added the StopForumSpam extension to Patchdemo (https://github.com/MatmaRex/patchdemo/pull/241), various operations on the wiki have been intermittently failing with the exception: "SFS IP file contents and file md5 do not match!" (see reports: T276393#6951820, T279395).

I tried debugging this, and it turns out that the file contents and file md5 really do not match sometimes. Fetching them from the URLs in the default configuration can give one of two possible results:

$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
d44309a15179f0616711d77c825fdd2e
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
d44309a15179f0616711d77c825fdd2e
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
d44309a15179f0616711d77c825fdd2e
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
d44309a15179f0616711d77c825fdd2e
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f
$ curl https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz.md5
4606a1f7bad999ce8d9028f4b6098e5f

$ curl -s https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz | md5sum
d44309a15179f0616711d77c825fdd2e *-

$ curl -s https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz | md5sum
d44309a15179f0616711d77c825fdd2e *-

$ curl -s https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz | md5sum
4606a1f7bad999ce8d9028f4b6098e5f *-

$ curl -s https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz | md5sum
4606a1f7bad999ce8d9028f4b6098e5f *-

$ curl -s https://www.stopforumspam.com/downloads/listed_ip_90_ipv46_all.gz | md5sum
d44309a15179f0616711d77c825fdd2e *-

Either the service needs to be fixed not to do that (I'm not familiar with it), or the extension needs to not check this if it's expected.

Event Timeline

I disabled the extension on Patchdemo on all existing wikis, and by default for newly created wikis (you can still enable it for testing if you want): https://github.com/MatmaRex/patchdemo/issues/284

I disabled the extension on Patchdemo on all existing wikis, and by default for newly created wikis (you can still enable it for testing if you want): https://github.com/MatmaRex/patchdemo/issues/284

Ok, sorry about the trouble. I'm working on a quick patch that introduces a new global that guards against the md5 file validation and is false by default. Should be up soon.

Also - it's worth noting that removing this functionality was discussed within a recent performance review ps. So it may not exist within future versions anyways.

How is it possible though that stopforumspam.com is serving two different versions of that file, seemingly randomly? This is definitely not increasing my trust in them. Anyway…

Change 677303 had a related patch set uploaded (by SBassett; author: SBassett):

[mediawiki/extensions/StopForumSpam@master] Guard md5 SFS file validation with new global variable

https://gerrit.wikimedia.org/r/677303

How is it possible though that stopforumspam.com is serving two different versions of that file, seemingly randomly? This is definitely not increasing my trust in them. Anyway…

I agree that's a bit disturbing and probably requires further investigation into them as a data provider. Unless this is a weird bug with how the extension is fetching those files and validating the hash, which I don't think it is.

Multiple requests in a row yielding two different outputs looks like it might be related to cache invalidation, or just generally due to them using CloudFlare infront...

Could the extension have a local cached copy of the last file that passed validation, and fall back to that? (Whether it's worth the effort is a separate question.)

Could the extension have a local cached copy of the last file that passed validation, and fall back to that? (Whether it's worth the effort is a separate question.)

Probably. I think the current change set is still the quicker approach to addressing this issue and possibly a config that would be worth retaining even if a more advanced fallback mechanism were introduced later.

Could the extension have a local cached copy of the last file that passed validation, and fall back to that? (Whether it's worth the effort is a separate question.)

In theory, yeah... And in a single server environment that'd probably work. But when you've got > 1... It's a case of "who has the file", or where do you store it so the server that does the update can push it in...

Change 677303 merged by jenkins-bot:

[mediawiki/extensions/StopForumSpam@master] Guard md5 SFS file validation with new global variable

https://gerrit.wikimedia.org/r/677303