Page MenuHomePhabricator

Deprecate and then reject empty operators
Open, Needs TriagePublic

Description

AbuseFilterParser supports the following empty operators, whereas the CachingParser does not:

  • (foo|) is equivalent to foo
  • (foo&) is equivalent to false
  • (foo&!) is equivalent to foo
  • (!) is equivalent to true
  • if () ...
  • if ( cond ) then () ...
  • if ( cond ) then ( ... ) else () end
  • var :=
  • ()

and also others, see AbuseFilterParserTest::testEmptyOperands.

In all cases, the missing argument of a boolean operator is treated as null.
With this fact, people could inadvertently break filters if they leave a leading operator, so we should really stop allowing it. We should first emit debug notices, then fix filters for WMF prod, and then drop support.

Plan as of 15 Sept

  • Formally deprecate empty operands with logging (with a clearer message) in MW 1.34
  • Change logEmptyOperand to be e.g. throwEmptyOperand and throw in MW 1.35 or 1.36

Details

Related Gerrit Patches:
mediawiki/extensions/AbuseFilter : masterHard-deprecate empty operands
mediawiki/extensions/AbuseFilter : masterAlign arg counting between the parsers
operations/mediawiki-config : masterMove the remaining wikis to AbuseFilterCachingParser
operations/mediawiki-config : masterEnable AbuseFilterCachingParser for (almost) all wikis
mediawiki/extensions/AbuseFilter : masterThrow AFPUserVisibleExceptions for empty operands in CachingParser
mediawiki/extensions/AbuseFilter : masterLog more empty operands
mediawiki/extensions/AbuseFilter : masterAdd the filter ID to empty operand logging
mediawiki/extensions/AbuseFilter : masterFurther deprecation for empty conditions
mediawiki/extensions/AbuseFilter : masterAdd a new data type for non-initialized stuff

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I looked at this but it's not that easy, as it requires some rewriting of the parser. Instead, it's easy to determine why this happens: when we reach a logical operator (but this also happens in other cases, for instance with arithmetical operations etc.), we instantiate a NULL AFPData, and pass by reference to various levels for computing. However, if no level populates the object, we still end up with a NULL AFPData, which is exactly a null value, and this is why missing stuff is treated as null. As I was saying, this is also true for arithmetic, array elements ([1,] === [1,null]) and function parameters (any_function(foo,) === any_function(foo,null)) - parameters count was addressed in rEABF32718888c0038103c7d2f9b647ddeee056957fd8 - and we should fix them all at once.

Nice catch. These seem like shortcuts that don't provide value to the rule syntax (they can all be done with other syntax). They were likely supported only by accident as part of making the PHP code simpler. If they have any usage its either by accident (causing problems already) or from someone trying to be clever, for both we should help users find it and then either migrate to the normal form or remove/disable.

In order to tell whether an AFPData is NULL because it's really null, or just because there was nothing to parse we'd need something like https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/478424/. Which is old, WIPpy, etc.

Change 512135 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add a new data type for non-initialized stuff

https://gerrit.wikimedia.org/r/512135

Change 512135 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add a new data type for non-initialized stuff

https://gerrit.wikimedia.org/r/512135

Change 527845 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Further deprecation for empty conditions

https://gerrit.wikimedia.org/r/527845

Uhhmmmm, so... Ideally this task should be easy to tackle with the new DNONE concept. For logic operators, function arguments, etc., the Parser does:

$rhs = new AFPData( AFPData::DNONE );
$this->doLevelXXX( $rhs );
// And here $rhs has been populated via pass-by-ref

So my idea was to change the above into the following:

$rhs = new AFPData( AFPData::DNONE );
$this->doLevelXXX( $rhs );
if ( $rhs->getType() === AFPData::DNONE ) {
  // We didn't touch the $rhs, thus it's empty - emit a warning or throw
}

but the solution above won't work: DNONE is also used (for instance) for non-available variables. So if you have condition1 & unavailable_var, this will empty the warning-branch above, despite the rhs not being empty.
So, either we find an alternative approach, or I'm thinking to:

  • Rename DNONE to DUNDEFINED
  • Add a DEMPTY type
  • Use DUNDEFINED for skipped and unavailable variables, keeping the exact behaviour that DNONE has right now
  • Use DEMPTY when initializing AFPData's to be later populated by reference (as above); if a DEMPTY is found at runtime, we should emit a warning in the deprecation phase, but eventually cast it as null, and throw a syntax error when all filters will have been fixed.

I believe this is pretty solid, and specializes the data types to a single task each. So I'm going to send a POC.

Change 527845 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Further deprecation for empty conditions

https://gerrit.wikimedia.org/r/527845

There are various occurrencies of channel:AbuseFilter AND "Empty operand of type" on logstash. Probably, we'd need to add the ID of the filter being parsed, in order to know exactly where to look.

Change 529474 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add the filter ID to empty operand logging

https://gerrit.wikimedia.org/r/529474

Daimona claimed this task.Aug 12 2019, 5:47 PM
Daimona moved this task from Future to Under review on the User-Daimona board.Aug 13 2019, 3:20 PM

Change 529474 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add the filter ID to empty operand logging

https://gerrit.wikimedia.org/r/529474

Daimona renamed this task from Deprecate and remove wild syntax to Deprecate and then reject empty operators.Aug 20 2019, 4:24 PM
Daimona updated the task description. (Show Details)
Daimona updated the task description. (Show Details)Aug 20 2019, 5:03 PM
Daimona updated the task description. (Show Details)Aug 20 2019, 5:23 PM

Change 531289 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Log more empty operands

https://gerrit.wikimedia.org/r/531289

Change 531289 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Log more empty operands

https://gerrit.wikimedia.org/r/531289

So, now we have full logging data, and here is a list of affected filters:

  • jawiki filter 4 - empty boolean operand
  • thwiki filter 25 - empty boolean operand
  • plwiki filter 23 - empty boolean operand
  • fawiki filter 119 - empty boolean operand
  • nlwiki filter 57 - empty boolean operand
  • enwikivoyage filter 24 - empty sum operand
  • hewikisource filter 2 - empty boolean operand
  • simplewiki filter 72 - empty boolean operand
  • dawiki filters 20, 24 and 33 - empty boolean operand
  • alswiki filter 11 - empty boolean operand
  • shwiki filter 11 - empty boolean operand
  • hewiki filter 81 - empty boolean operand
  • jawikisource filters 1, 2 and 4 - empty boolean operand
  • jawikibooks filter 2 - empty boolean operand
  • jawikiquote filter 1 - empty boolean operand
  • skwiki filter 10 - empty boolean operand
  • lmowiki filter 1 - empty boolean operand
  • rmwiki filter 1 - empty boolean operand
  • eswiki filter 51 - empty boolean operand

And given that the RFC is still far from conclusion, we should announce these in Tech News, together with the list that I'm going to add to T153251.

@Daimona Please could you suggest an appropriate way to phrase this in Tech News? Ideally a 1 (or 2) sentence description that is easily translatable. Thanks!

@Quiddity sure... Something like

Soon, the AbuseFilter will recognize new syntax errors. Specifically, all new errors are about empty operands, see a list of examples on [[phab:T156096|phabricator]]. Any active filter with such an error will stop working; hence, please take a look at the [[phab:T156096#5464310|list]] of affected filters, and fix the ones that you can fix.

And, at the risk of sounding repetitive, I'd re-add:

Note that there's an ongoing [[:m:Requests_for_comment/Creating_abusefilter-manager_global_group|RFC]] on meta about the creation of a new abusefilter-manager global group. If it passes, this kind of AbuseFilter maintenance will be performed by members of this group.

I'm sorry if that's not short, or easily translatable, but I don't know how to simplify it. The message is addressed to AF editors, which are usually tech-savvy people, so I'm unsure about how much we can simplify.

Daimona updated the task description. (Show Details)Sep 8 2019, 6:29 PM
Daimona updated the task description. (Show Details)Sep 9 2019, 12:00 PM

Change 535169 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Throw AFPUserVisibleExceptions for empty operands in CachingParser

https://gerrit.wikimedia.org/r/535169

Change 535169 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Throw AFPUserVisibleExceptions for empty operands in CachingParser

https://gerrit.wikimedia.org/r/535169

So, now this is again the only blocker for the new parser (together with T153251, but they're actually the same thing). The tech news issue asking people to fix filters was published 6 days ago. While it's not that much time, I went to logstash to check how we're doing. And... We're not doing good. Here is the trend over the last 7 days [1]; we can say it has remained constant. If it keeps staying constant, I wonder what we could do to get things done quicker. The outcome of the meta RFC is distant and uncertain, hence it's not something we'll be able to use for this task anyway. I wonder, is the sysadmin group usable for this?

[1] - The empty operand's type uses the "type" key, which is overwritten by "mediawiki" and hence is not usable :-/

Daimona added a comment.EditedSep 15 2019, 4:19 PM

Linked list:

Huji added a subscriber: Huji.EditedSep 15 2019, 4:37 PM

@Daimona @Krinkle on one hand, we have https://meta.wikimedia.org/wiki/Requests_for_comment/Creating_abusefilter-manager_global_group in which there is ongoing debate about whether a central user should be allowed to modify filters on different wikis; and as you know, I got Global Sysop rights explicitly to do that but it is limited to wikis that have not opted out.

On the other hand, I see in the post above (from less than 20 minutes ago) that a filter on fawiki needs a change and before I (a local admin) can get to it, I see Krinkle has fixed it. This is contradictory. If we already have a user (Krinkle) who is allowed to do this for wikis like fawiki (which has a large, active admin group and is in the opt-out list for Global Sysop), then why the RfC? If we think we need permission from the community and hence the RfC, then why is Krinkle allowed to do those edits?

PS: Krinkle, which permission are you using to make these edits? Global edit interface is not it, and I don't think you are a global sysop.

Mentioned in SAL (#wikimedia-operations) [2019-09-15T16:51:23Z] <Krinkle> Fixed a dozen abuse filtes, listed at https://phabricator.wikimedia.org/T156096#5494060. The trailing pipe character was removed from filters that had it which is no longer supported in a future version of AbuseFilter.

@Daimona @Krinkle on one hand, we have RFC in which there is ongoing debate about whether a central user should be allowed to modify filters on different wikis; and as you know, I got Global Sysop rights explicitly to do that but it is limited to wikis that have not opted out.
On the other hand, I see in the post above (from less than 20 minutes ago) that a filter on fawiki needs a change and before I (a local admin) can get to it, I see Krinkle has fixed it. This is contradictory. [..]

I upgraded the filter syntax as system administrator, and asked Daimona to help me by providing a link to each filter. Phabricator was a convenient place to share them with me, sorry about the confusion.

These are solely syntactical changes needed before a software update to keeping the existing filters functional. If this specific change were needed regularly, we could have developed a maintenance script instead. The change consist of removing a single redundant character from the filter syntax, example at skwiki filter 10 diff 299. For anything less trivial than this, I would not have offered my help as sysadmin. Such changes would require a decision to be made on how the filter should behave. I cannot decide on that.

The number of filters that exist and their complexity is in the hands of the community. As such, I believe the community should also have the responsibility and capability to maintain these on their own, in a timely manner. If the community chooses not to accept this RFC, or is unable to agree on other ways to ensure timely upgrade of abuse filters, then the result will likely be that unmaintained filters get disabled after some time due to invalid syntax. I support Daimona's RFC because I do not wish that to be the case. For this once, given the simplicity of the change, and the small number of filters that required changes, I took two hours of my weekend to solve the problem.

Huji added a comment.Sep 15 2019, 5:28 PM

My intent was not to criticize you, personally, @Krinkle. What I am trying to figure out is why in that RFC we never mentioned that there exists a group of sysadmins, yourself included, that already have the rights and permissions to modify filters for technical reasons only. I think the responses to that RFC would have differed if the participants knew that a group like this already exists. Right now, their point of reference is the Global Sysop group (which is significantly restricted by the opt-out process).

Would you be okay if I clarify this on the RFC? And would you have a preference on the wording used to describe the rights and permissions of the sysadmins, as it comes to trivial, necessary, technical changes to abuse filters?

If we already have a user (Krinkle) who is allowed to do this for wikis like fawiki (which has a large, active admin group and is in the opt-out list for Global Sysop), then why the RfC?

All sysadmins can do this, and they've always been able to. That's more or less what happened with the abusefilter-manager group a few years ago. The idea behind the RFC was to make this process entirely managed on-wiki, instead of recurring to backend "hacks" like the sysadmin group. The RFC has been there for more than one year, and based on the last comments, it's not even clear what the outcome could be. Until the RFC is closed, IMHO it's OK to keep doing as before, i.e., ask sysadmins.

As I said above, the tech news issue asking to fix those filters went out roughly a week ago. Tomorrow a new one will be published, hence the chances of people noticing the call to action will go down. As I said, in the timespan of a week, no filter was fixed. And since things could only get worse, I've asked about sysadmins.

The number of filters that exist and their complexity is in the hands of the community. As such, I believe the community should also have the responsibility and capability to maintain these on their own, in a timely manner. If the community chooses not to accept this RFC, or is unable to agree on other ways to ensure timely upgrade of abuse filters, then the result will likely be that unmaintained filters get disabled after some time due to invalid syntax.

Yeah, which is the reason why the RFC was created. As a filter maintainer in my home wiki, I would be upset if a filter stopped working just because of a backend change, hence I don't want it to happen. But as I said, if the communities don't agree on the RFC and/or deem it as useless, sysadmins are the only way to go. Unless the communities also agree to let old filters die whenever we change the backend, but that'd be crazy.

(Uh-oh, edit conflict)

What I am trying to figure out is why in that RFC we never mentioned that there exists a group of sysadmins, yourself included, that already have the rights and permissions to modify filters for technical reasons only.

Sysadmins can technically do everything. But IMHO that's a hacky solution, and not suitable for the long-term. It's also something to be used in exceptional cases. Having a group would delegate everything to the community control, and avoid such exceptional cases altogether.

Would you be okay if I clarify this on the RFC? And would you have a preference on the wording used to describe the rights and permissions of the sysadmins, as it comes to trivial, necessary, technical changes to abuse filters?

(Not directed to me, but:) Personally I'm OK. But I'd also explicate that currently, fixing filters is NOT in the scope of sysadmins. And that granting sysadmin rights goes beyond the way things are done on-wiki (I've learned the right is assigned via shell). *If* people think it could become standard practice to perform technical AF maintenance using the sysadmin group, that's OK. The important is to know what the trade-offs are.

Little update: production filters now seem to be OK. Next steps here are:

  • Formally deprecate empty operands with logging (either via wfDeprecated, or stating it clearly in the message)
  • Before switching to CachingParser, re-check logstash to ensure no new problems (hence leaving open as subtask)
  • Change logEmptyOperand to be e.g. throwEmptyOperand and throw [in a future MW version]
Daimona updated the task description. (Show Details)Sep 15 2019, 6:10 PM
Daimona updated the task description. (Show Details)Sep 15 2019, 6:12 PM

Change 537074 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Hard-deprecate empty operands

https://gerrit.wikimedia.org/r/537074

Daimona updated the task description. (Show Details)Sep 16 2019, 1:56 PM
Daimona moved this task from Blocker to Deprecate/Remove on the MW-1.34-release board.

The deprecation should happen in 1.34.

Note that the following production filters are still broken - probably unnoticed because on small wikis:

Note that the following production filters are still broken - probably unnoticed because on small wikis:

And also plwiki filter 45.

Change 540647 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for (almost) all wikis

https://gerrit.wikimedia.org/r/540647

Change 540647 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for (almost) all wikis

https://gerrit.wikimedia.org/r/540647

Note that the following production filters are still broken - probably unnoticed because on small wikis:

And also plwiki filter 45.

Adding svwiki 115 to the list. This filter is now erroring out since svwiki is already using the new parser.

Nirmos added a subscriber: Nirmos.Oct 5 2019, 4:39 PM

Adding svwiki 115 to the list. This filter is now erroring out since svwiki is already using the new parser.

I have informed Svensson1, Larske and JohanahoJ about this at https://sv.wikipedia.org/wiki/Wikipediadiskussion:Redigeringsfilter#Filter_115

Adding svwiki 115 to the list. This filter is now erroring out since svwiki is already using the new parser.

I have informed Svensson1, Larske and JohanahoJ about this at https://sv.wikipedia.org/wiki/Wikipediadiskussion:Redigeringsfilter#Filter_115

Thanks! I've also notified the other wikis: plwiki, viwiki, and enwikivoyage.

Nirmos added a comment.Oct 5 2019, 4:59 PM

@Daimona: Larske has now removed a dangling | at the end of https://sv.wikipedia.org/wiki/Special:Missbruksfilter/115

Can you confirm that the error is fixed?

@Daimona: Larske has now removed a dangling | at the end of https://sv.wikipedia.org/wiki/Special:Missbruksfilter/115
Can you confirm that the error is fixed?

Totally, thanks a lot!

FTR, enwikivoyage and viwiki done. Only plwiki is left!

Change 541028 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Align arg counting between the parsers

https://gerrit.wikimedia.org/r/541028

FTR, enwikivoyage and viwiki done. Only plwiki is left!

... And also ckbwiki filter 7, just discovered.

FTR, enwikivoyage and viwiki done. Only plwiki is left!

... And also ckbwiki filter 7, just discovered.

I'd like to ask sysops to fix it, but unfortunately gtranslate doesn't translate Sorani... Actually, trying to translate this page yields a pretty funny result.
Anyway, I've tried.

ckbwiki was fixed. So, again, only plwiki is left, and I've CC'ed the filter author. @Krinkle wanna deploy the change tomorrow?

Quiddity removed a subscriber: Quiddity.Oct 6 2019, 9:22 PM

plwiki fixed. So we can now enable the new parser on all wikis!

Change 541026 had a related patch set uploaded (by Krinkle; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Move the remaining wikis to AbuseFilterCachingParser

https://gerrit.wikimedia.org/r/541026

Change 541026 merged by jenkins-bot:
[operations/mediawiki-config@master] Move the remaining wikis to AbuseFilterCachingParser

https://gerrit.wikimedia.org/r/541026