Page MenuHomePhabricator

AbuseFilterCachingParser fails to parse filters with trailing comma in function calls (works in old parser)
Open, HighPublic

Description

AbuseFilterCachingParser never matches certain filters, which match with the old parser. See T153217 for the WMF side of the issue.

For example, copy Commons' filter 31 to your wiki and try uploading the attached file.

I don't know what is causing this.


Trace from T224746:

message
PHP Fatal Error: Argument 1 passed to AbuseFilterCachingParser::evalNode() must be an instance of AFPTreeNode, null given
trace
#0 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterCachingParser.php(123): array_map(array, array)
#1 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterCachingParser.php(230): AbuseFilterCachingParser->evalNode(AFPTreeNode)
#2 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterCachingParser.php(66): AbuseFilterCachingParser->evalNode(AFPTreeNode)
#3 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterParser.php(190): AbuseFilterCachingParser->intEval(string)
#4 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(541): AbuseFilterParser->parse(string)
#5 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(699): AbuseFilter::checkConditions(string, AbuseFilterVariableHolder, boolean, string)
#6 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(600): AbuseFilter::checkFilter(stdClass, AbuseFilterVariableHolder, Title, string, string)
#7 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(1195): AbuseFilter::checkAllFilters(AbuseFilterVariableHolder, Title, string, string)
#8 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilterHooks.php(832): AbuseFilter::filterAction(AbuseFilterVariableHolder, Title, string, User)
#9 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilterHooks.php(744): AbuseFilterHooks::filterUpload(string, UploadFromFile, User, array, NULL, NULL, NULL)
#10 /srv/mediawiki/php-master/includes/Hooks.php(174): AbuseFilterHooks::onUploadStashFile(UploadFromFile, User, array, NULL)
#11 /srv/mediawiki/php-master/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#12 /srv/mediawiki/php-master/includes/upload/UploadBase.php(1081): Hooks::run(string, array)
#13 /srv/mediawiki/php-master/includes/upload/UploadBase.php(1061): UploadBase->runUploadStashFileHook(User)
#14 /srv/mediawiki/php-master/includes/api/ApiUpload.php(315): UploadBase->tryStashFile(User, boolean)
#15 /srv/mediawiki/php-master/includes/api/ApiUpload.php(162): ApiUpload->performStash(string, array)
#16 /srv/mediawiki/php-master/includes/api/ApiUpload.php(135): ApiUpload->getStashResult(array)
#17 /srv/mediawiki/php-master/includes/api/ApiUpload.php(104): ApiUpload->getContextResult()
#18 /srv/mediawiki/php-master/includes/api/ApiMain.php(1595): ApiUpload->execute()
#19 /srv/mediawiki/php-master/includes/api/ApiMain.php(531): ApiMain->executeAction()
#20 /srv/mediawiki/php-master/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#21 /srv/mediawiki/php-master/api.php(87): ApiMain->execute()
#22 /srv/mediawiki/w/api.php(3): include(string)
#23 {main}

Event Timeline

matmarex created this task.Dec 14 2016, 9:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 14 2016, 9:05 PM
vvv added a comment.Dec 18 2016, 9:09 AM

Okay, the answer is simple and somewhat disturbing: the new parser just failed to parse the filter in question with the syntax error, so it presumably never got executed. I've expected this to fail more loudly than silently failing open, but I am not sure how the filter edit hooks actually handle this.

The specific parsing failure here is that in the old parser both of the following worked:

contains_any('foobar', 'foo', 'bar')
contains_any('foobar', 'foo', 'bar',)

Now, those are not strictly speaking equivalent. f(x,y) calls a function with two arguments, and f(x,y,) calls it with three arguments, last being null. Most functions would fail due to argument count mismatch, but
contains_any is vararg, so null gets turned into "", which then gets treated as no-op (see T62203), unless you have fss_* functions, in which case I have no idea what it actually does, but presumably the same thing, since this actually works in production.

I'll have to figure out how to reproduce this behavior, but I first have to reason carefully about what other funny comma placement can happen here.

$ python -c 'import this' | grep implicit
Explicit is better than implicit.

I'd suggest that a lone comma before the parenthesis should either fail loudly, or be parsed as if the comma weren't there. I'm not sure about php, but python 3 ((lambda *args: print(args))(1,2,3,)) treats the comma as non-existent and javascript / ecmascript 6 (((...args) => console.log(args))(1,2,3,);) treats it as a syntax error.

contains_any is vararg, so null gets turned into "", which then gets treated as no-op (see T62203), unless you have fss_* functions, in which case I have no idea what it actually does, but presumably the same thing, since this actually works in production.

Wikimedia uninstalled fss a while back. It's no longer promoted for PHP 5.5+ installs (so MW 1.27+) since the implementation in PHP is much faster. See https://www.mediawiki.org/wiki/FastStringSearch.

I'd suggest that a lone comma before the parenthesis should either fail loudly, or be parsed as if the comma weren't there.

Given that these filters exist already, and there is no mechanism to warn the users of incompatible filters when upgrading or changing config settings, I think ignoring the trailing comma would be a reasonable option. There are 53 affected filters on 32 Wikimedia wikis (T153217#2888136; 73 filters on 33 wikis, if you could the disabled/deleted ones).

Krinkle renamed this task from AbuseFilterCachingParser never matches certain filters, which match with the old parser to AbuseFilterCachingParser fails to parse filters with trailing comma in function calls (works in old parser).Aug 17 2017, 9:56 PM
Daimona moved this task from Backlog to Internal bugs on the AbuseFilter board.Apr 24 2018, 4:49 PM

So what should we do? My proposal is to throw an error if the function is not variadic, and take a null argument otherwise. At any rate, this would take a great benefit from the runtime error page introduced in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/429381/ (failures would be listed there, and not be silent).

Krinkle added a subscriber: Krinkle.EditedJan 10 2019, 7:26 PM

I think both either of these are good ideas for moving forward.

  1. Ignore all trailing commas (unconditionally, not function-dependant). This would change behaviour from today, but keep the filters enabled (today they cause an additional null/empty string parameter, possibly leading to false positive/negative outcome of filters).
  2. Disallow trailing commas in the syntax. If I understand correctly, this would lead to the filter effectively being auto-disabled, with AF-admins being able to inspect them on the AF-Problems page.

I slightly lean in favour of option 1, because it would likely be easier to implement, with lower risk/impact. And thus get us to resolving the parent task sooner, which is more important in my opinion. We can always integrate this with the AF-Problems page at a later time, after we get caching working.

The problem here is the usual one. Option A is less sane, but should be totally back-compat (unless a function call relied on this feature to pass a null parameter, but that would be crazy). Option B is the one I like most, since it helps having a saner syntax. However, it would break any filter using trailing commas. Unfortunately, filter editor won't have any way to know that, as filter failures are only sent to logstash. The patch for it is https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/429381/ (and related ones), which adds a new page for filter failures at runtime.

However, I also agree with the last two lines. Specifically, we can first start ignoring trailing commas (and maybe send a debug log whenever we find them), and then completely disallow them as we'll have an on-wiki page to monitor filter failures.

Change 483707 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Ignore trailing commas in function calls

https://gerrit.wikimedia.org/r/483707

Change 483707 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Ignore trailing commas in function calls

https://gerrit.wikimedia.org/r/483707

Change 483707 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Ignore trailing commas in function calls

https://gerrit.wikimedia.org/r/483707

matmarex closed this task as Resolved.May 27 2019, 2:04 PM
matmarex assigned this task to Daimona.
Daimona reopened this task as Open.May 27 2019, 2:31 PM
Daimona removed a project: Patch-For-Review.

Actually, I think caching parser still fails, that was just the first step.

So, now wmf.7 is everywhere and it includes the logging. The patch I wrote sends data to debug files, so I cannot see it on logstash. @Krinkle could you please grep the log files to see how many entries we have for "Found null param for function"? Ideally, we'd need it on a per-filter basis. However, the Parser is currently unaware of the filter it is parsing (although it will know in the future). Thus, I guess we're fine with a per-wiki count. Thanks!