Page MenuHomePhabricator

AbuseFilterCachingParser fails to parse filters with trailing comma in function calls (works in old parser)
Closed, ResolvedPublic

Description

AbuseFilterCachingParser never matches certain filters, which match with the old parser. See T153217 for the WMF side of the issue.

For example, copy Commons' filter 31 to your wiki and try uploading the attached file.

I don't know what is causing this.

Chrysanthemum.jpg (768×1 px, 858 KB)


Trace from T224746:

message
PHP Fatal Error: Argument 1 passed to AbuseFilterCachingParser::evalNode() must be an instance of AFPTreeNode, null given
trace
#0 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterCachingParser.php(123): array_map(array, array)
#1 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterCachingParser.php(230): AbuseFilterCachingParser->evalNode(AFPTreeNode)
#2 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterCachingParser.php(66): AbuseFilterCachingParser->evalNode(AFPTreeNode)
#3 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/parser/AbuseFilterParser.php(190): AbuseFilterCachingParser->intEval(string)
#4 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(541): AbuseFilterParser->parse(string)
#5 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(699): AbuseFilter::checkConditions(string, AbuseFilterVariableHolder, boolean, string)
#6 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(600): AbuseFilter::checkFilter(stdClass, AbuseFilterVariableHolder, Title, string, string)
#7 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilter.php(1195): AbuseFilter::checkAllFilters(AbuseFilterVariableHolder, Title, string, string)
#8 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilterHooks.php(832): AbuseFilter::filterAction(AbuseFilterVariableHolder, Title, string, User)
#9 /srv/mediawiki/php-master/extensions/AbuseFilter/includes/AbuseFilterHooks.php(744): AbuseFilterHooks::filterUpload(string, UploadFromFile, User, array, NULL, NULL, NULL)
#10 /srv/mediawiki/php-master/includes/Hooks.php(174): AbuseFilterHooks::onUploadStashFile(UploadFromFile, User, array, NULL)
#11 /srv/mediawiki/php-master/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#12 /srv/mediawiki/php-master/includes/upload/UploadBase.php(1081): Hooks::run(string, array)
#13 /srv/mediawiki/php-master/includes/upload/UploadBase.php(1061): UploadBase->runUploadStashFileHook(User)
#14 /srv/mediawiki/php-master/includes/api/ApiUpload.php(315): UploadBase->tryStashFile(User, boolean)
#15 /srv/mediawiki/php-master/includes/api/ApiUpload.php(162): ApiUpload->performStash(string, array)
#16 /srv/mediawiki/php-master/includes/api/ApiUpload.php(135): ApiUpload->getStashResult(array)
#17 /srv/mediawiki/php-master/includes/api/ApiUpload.php(104): ApiUpload->getContextResult()
#18 /srv/mediawiki/php-master/includes/api/ApiMain.php(1595): ApiUpload->execute()
#19 /srv/mediawiki/php-master/includes/api/ApiMain.php(531): ApiMain->executeAction()
#20 /srv/mediawiki/php-master/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#21 /srv/mediawiki/php-master/api.php(87): ApiMain->execute()
#22 /srv/mediawiki/w/api.php(3): include(string)
#23 {main}

Plan as of 15 Sept

  • When wmf.23 reaches production, check if there are filters with dangling commas
  • Fix them
  • Close this task and proceed with the plan outlined at T156096

Event Timeline

Okay, the answer is simple and somewhat disturbing: the new parser just failed to parse the filter in question with the syntax error, so it presumably never got executed. I've expected this to fail more loudly than silently failing open, but I am not sure how the filter edit hooks actually handle this.

The specific parsing failure here is that in the old parser both of the following worked:

contains_any('foobar', 'foo', 'bar')
contains_any('foobar', 'foo', 'bar',)

Now, those are not strictly speaking equivalent. f(x,y) calls a function with two arguments, and f(x,y,) calls it with three arguments, last being null. Most functions would fail due to argument count mismatch, but
contains_any is vararg, so null gets turned into "", which then gets treated as no-op (see T62203), unless you have fss_* functions, in which case I have no idea what it actually does, but presumably the same thing, since this actually works in production.

I'll have to figure out how to reproduce this behavior, but I first have to reason carefully about what other funny comma placement can happen here.

$ python -c 'import this' | grep implicit
Explicit is better than implicit.

I'd suggest that a lone comma before the parenthesis should either fail loudly, or be parsed as if the comma weren't there. I'm not sure about php, but python 3 ((lambda *args: print(args))(1,2,3,)) treats the comma as non-existent and javascript / ecmascript 6 (((...args) => console.log(args))(1,2,3,);) treats it as a syntax error.

contains_any is vararg, so null gets turned into "", which then gets treated as no-op (see T62203), unless you have fss_* functions, in which case I have no idea what it actually does, but presumably the same thing, since this actually works in production.

Wikimedia uninstalled fss a while back. It's no longer promoted for PHP 5.5+ installs (so MW 1.27+) since the implementation in PHP is much faster. See https://www.mediawiki.org/wiki/FastStringSearch.

I'd suggest that a lone comma before the parenthesis should either fail loudly, or be parsed as if the comma weren't there.

Given that these filters exist already, and there is no mechanism to warn the users of incompatible filters when upgrading or changing config settings, I think ignoring the trailing comma would be a reasonable option. There are 53 affected filters on 32 Wikimedia wikis (T153217#2888136; 73 filters on 33 wikis, if you could the disabled/deleted ones).

Krinkle renamed this task from AbuseFilterCachingParser never matches certain filters, which match with the old parser to AbuseFilterCachingParser fails to parse filters with trailing comma in function calls (works in old parser).Aug 17 2017, 9:56 PM

So what should we do? My proposal is to throw an error if the function is not variadic, and take a null argument otherwise. At any rate, this would take a great benefit from the runtime error page introduced in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/429381/ (failures would be listed there, and not be silent).

I think both either of these are good ideas for moving forward.

  1. Ignore all trailing commas (unconditionally, not function-dependant). This would change behaviour from today, but keep the filters enabled (today they cause an additional null/empty string parameter, possibly leading to false positive/negative outcome of filters).
  2. Disallow trailing commas in the syntax. If I understand correctly, this would lead to the filter effectively being auto-disabled, with AF-admins being able to inspect them on the AF-Problems page.

I slightly lean in favour of option 1, because it would likely be easier to implement, with lower risk/impact. And thus get us to resolving the parent task sooner, which is more important in my opinion. We can always integrate this with the AF-Problems page at a later time, after we get caching working.

The problem here is the usual one. Option A is less sane, but should be totally back-compat (unless a function call relied on this feature to pass a null parameter, but that would be crazy). Option B is the one I like most, since it helps having a saner syntax. However, it would break any filter using trailing commas. Unfortunately, filter editor won't have any way to know that, as filter failures are only sent to logstash. The patch for it is https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/429381/ (and related ones), which adds a new page for filter failures at runtime.

However, I also agree with the last two lines. Specifically, we can first start ignoring trailing commas (and maybe send a debug log whenever we find them), and then completely disallow them as we'll have an on-wiki page to monitor filter failures.

Change 483707 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Ignore trailing commas in function calls

https://gerrit.wikimedia.org/r/483707

Change 483707 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Ignore trailing commas in function calls

https://gerrit.wikimedia.org/r/483707

Change 483707 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Ignore trailing commas in function calls

https://gerrit.wikimedia.org/r/483707

matmarex assigned this task to Daimona.
Daimona removed a project: Patch-For-Review.

Actually, I think caching parser still fails, that was just the first step.

So, now wmf.7 is everywhere and it includes the logging. The patch I wrote sends data to debug files, so I cannot see it on logstash. @Krinkle could you please grep the log files to see how many entries we have for "Found null param for function"? Ideally, we'd need it on a per-filter basis. However, the Parser is currently unaware of the filter it is parsing (although it will know in the future). Thus, I guess we're fine with a per-wiki count. Thanks!

Logstash shows many occurrencies of channel:AbuseFilter AND "Empty operand of type function". Like for T156096#5407058, we have to add the filter ID.

Change 529474 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add the filter ID to empty operand logging

https://gerrit.wikimedia.org/r/529474

Change 529474 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add the filter ID to empty operand logging

https://gerrit.wikimedia.org/r/529474

List of broken filters, to be announced together with T156096#5464310:

  • ruwiki 14
  • enwiktionary 82
  • zhwiki 6, 31, 38, 50, 53, 55, 64
  • cswiki 2, 74
  • commonswiki 31, 206
  • jawiki 18, 80, 105, 106
  • fiwiki 124
  • fawiki 3, 9, 19, 33
  • nowiki 4
  • zh_yuewiki 3, 20
  • kkwiki 3
  • dawiki 11
  • elwiki 3
  • hiwiki 43, 61, 80
  • mlwiki 3
  • nnwiki 5
  • mrwiki 21
  • zh_classicalwiki 18, 19
  • jawikisource 1, 2, 4
  • jawikibooks 2
  • fawiktionary 9
  • eswiktionary 6
  • fawikiquote 2, 25, 26
  • jawikiquote 1
  • fawikivoyage 2
  • fawikinews 23
  • fawikibooks 10
  • zhwikinews 12

Given the high amount of affected filters, I'd like to wait a bit. We could also tolerate dangling commas in variadic functions. I'd like to take some time to discuss whether that's a best option, and maybe implement it.

Change 534842 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Allow dangling commas in variargs

https://gerrit.wikimedia.org/r/534842

Change 534842 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Allow dangling commas in variargs

https://gerrit.wikimedia.org/r/534842

@Daimona Looks like this is resolved?

Not yet... Ideally, we'd need to:

The rest can be done as part of T156096.

Ah, actually that was handled separately for func args in https://gerrit.wikimedia.org/r/#/c/534842/. So only checking & fixing on-wiki filters is left to do.

Linked version of T153251#5464352:

I'm going to sample-check some of them to see how many of those are for variadic functions (which will be allowed since this week).

wmf.22 is now everywhere, so I checked logstash. We don't have any filter using trailing commas for non-variadic functions. Hence, I'm going to close this task per plan in the description.

Daimona updated the task description. (Show Details)