Page MenuHomePhabricator

Abuse filter hit fails the "Examine" interface for the text of the filter
Closed, ResolvedPublic

Description

On English Wikipedia, Abuse Filter number 58 hasn't been updated since April 22nd; however, when I use the "exmaine" interface for hits from April 29-30, using the filter content of "Load filter ID" for number 58, I get a message that "The filter did not match this change".

Event Timeline

OdMishehu created this task.May 5 2015, 9:33 AM
OdMishehu raised the priority of this task from to Needs Triage.
OdMishehu updated the task description. (Show Details)
OdMishehu added a project: AbuseFilter.
OdMishehu added a subscriber: OdMishehu.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 5 2015, 9:33 AM
Aklapper triaged this task as Low priority.May 21 2015, 2:37 PM
matej_suchanek changed the task status from Open to Stalled.Sep 5 2017, 4:20 PM
matej_suchanek added a subscriber: matej_suchanek.

Given that the filter has been private, it's very difficult for us to debug.

Daimona moved this task from Backlog to Internal bugs on the AbuseFilter board.Apr 24 2018, 4:44 PM
Daimona changed the task status from Stalled to Open.May 9 2018, 7:45 PM
Daimona added a subscriber: Daimona.

I gave a quick look. Some useful links:

Now, some considerations. Although it's not easy to test, the problem is likely coming either from "stringy", or the first used variable or the function at line 13 (sorry for being cryptic). However, I don't know which of these is the real cause. Also, /examine seems to show the right variables.

May be useful to see the actual var_dump, again with fetchText.php. Old_ids for three affected random entries are 665112684, 665041142 and 665166487.

Daimona closed this task as Resolved.Jul 17 2018, 1:30 PM
Daimona claimed this task.

Thanks to T193903 I could finally do the testing directly on enwiki, so I found out some abuselog entries with the described problem, for instance this one. Now, you can easily see that the added text (i.e. added_lines) almost matches a piece of stringy (it's on the second row, it takes few time to find it). I said "almost", because there's a difference, which I'll say explicitly since it's not that big deal: the character "5" from stringy is actually an "S" in the added text, and it would also be trasformed from "5" to "S" by applying normalization to stringy. Probably this is because some old version of equivset changed all S's to 5 and caused the filter to match, while now it doesn't happen anymore and there's no match.