Author: delbu9c1
Description:
In analyzing a false positive, I've been trying to track down the reason my regex debugger says a regex doesn't match yet it does match on the abuse filter. Eventually I found what appears to be a good lead on the issue.
Details of the incorrect match are here: http://test.wikipedia.org/w/index.php?title=Special:AbuseLog&details=1784
It appears what's going on is the é (which appears to be encoded in UTF-8) is mishandled when testing against the regex. It appears to the regex engine as a word boundary, so the match succeeds (specifically, "\brence\b" matches "conférence").
Hopefully there's a way to correct this and it's not a problem in the heart of PHP instead.
Please let me know if you need any additional information.
- Shirik @ enwiki
Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=46773