Background & Problem to solve
As discussed on T29987, the current practice to run ccnorm on things and then compare them to the alleged canonical form of a string is not viable.
Tim proposed something like:
That's ok but I (@Nemo_bis) think a more sensible syntax would be like
cclike(added_lines, testing) || cclike(added_lines, vandalizing)
That is, a single function should take two strings and tell us if, once canonicalised in whatever manner the code wants, they are the same thing, AKA if they are confusable.
This is nothing special: it's the approach followed by the standard API to ICU data, see uspoof_areConfusable in https://ssl.icu-project.org/apiref/icu4c/uspoof_8h.html#ac96fdf642bfd9efcd0d9956bd76cadaa, found from the documents mentioned in T65217. I was pointed to UTS #36 and UTS #39 by Nikerabbit, they were just drafts when AntiSpoof was created. Now we have better tools.
- Build a function in AbuseFilter that allows the comparison of two canonicalised strings