Page MenuHomePhabricator

More extensive unit testing for AntiSpoof
Closed, ResolvedPublic3 Estimated Story Points

Description

Right now there is only very limited testing of AntiSpoof data. We should make this much more extensive and comprehensive.

Acceptance criteria

  • Propose to AHT and CommTech what the unit tests should cover
  • Incorporate any feedback from other devs
  • Implement the tests

Tests

  • Testing whatever function(s) are in the library for doing string comparison
  • Making sure we test at least one equivalency that involves recursive mapping, e.g. Θ -> 0 -> O
  • Making sure we test at least one equivalency that involves case change
  • Testing a few random spoof strings, e.g. j1mmy w4l35 (no idea if that one even works)
  • Sequence processing like vv -> w and rn -> m that we should not forget to include
  • All the different badCharErr fail situations (?)

Event Timeline

@kaldari Here are my thoughts:

  • We should probably add tests cases to the new library we create in T174197, which is what @Huji asked for anyways.
  • We could use the AntiSpoof logs to get examples of when the filter was tripped. This would create regression tests for us. I'm not sure how helpful that would be because I can't see us removing characters, but I guess it protects against regressions.
  • As far as getting examples of where AntiSpoof was not tripped, but it should have. I'm not sure how we'd do this other than soliciting community feedback? I suppose we could query for all the usernames, but I'm not sure how we'd determine which ones should have been flagged. I guess we could run them all through Spoofchecker::isSuspicious and that would at least give us a list to take a look at.

What do you think?

@dbarratt: I don't think we need to go crazy with testing the equivalencies. Basically, I just want to make sure we have the following covered:

  • Testing whatever function(s) are in the library for doing string comparison
  • Making sure we test at least one equivalency that involves recursive mapping, e.g. Θ -> 0 -> O
  • Making sure we test at least one equivalency that involves case change
  • Testing a few random spoof strings, e.g. j1mmy w4l35 (no idea if that one even works)

Other things to test for:

  1. Sequence processing like vv -> w and rn -> m that we should not forget to include
  2. All the different badCharErr fail situations (?)
dbarratt moved this task from In progress to Ready on the Anti-Harassment (AHT Sprint 7) board.
  • Making sure we test at least one equivalency that involves recursive mapping, e.g. Θ -> 0 -> O

So I'm adding a test for this case, but it doesn't actually look like we do recursive mapping, we actually have both variations in the mapping:
https://github.com/wikimedia/Equivset/blob/5dcb3d77a6eaf9cb65ed45b547ddc839c246a49c/data/equivset.in#L311
and
https://github.com/wikimedia/Equivset/blob/5dcb3d77a6eaf9cb65ed45b547ddc839c246a49c/data/equivset.in#L329

dbarratt updated the task description. (Show Details)

Change 385392 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/extensions/AntiSpoof@master] Add more unit tests.

https://gerrit.wikimedia.org/r/385392

Sequence processing like vv -> w and rn -> m that we should not forget to include

This is a little weird... we are doing the sequence processing in AntiSpoof (not the equivset library)
https://phabricator.wikimedia.org/diffusion/EANS/browse/master/AntiSpoof_body.php;31ff54dfbedcf49b8cde9ec8bd53f4f79a5f194e$382

It looks like tests already exist for that (since I had to remove them for equivset), and it seems a little half baked. I wonder if we should add sequence processing to equivset? or should that be a new library?

It looks like AbuseFilter was never doing sequence processing since it's outside of the normalizeString method.

Regardless, I will leave the tests in AntiSpoof for sequence processing.

Considering that Equivset is just a mapping of "similar" looking characters I think sequence processing should not be part of it.

All the different badCharErr fail situations (?)

It took me a while to find a bad character that would trip the error, but I found one and included it in T177667

Change 385392 merged by jenkins-bot:
[mediawiki/extensions/AntiSpoof@master] Add more unit tests.

https://gerrit.wikimedia.org/r/385392