Page MenuHomePhabricator

Inconsistent treatment of multiline strings in AbuseFilter matching edits and diagnostic tools
Open, Needs TriagePublicBUG REPORT

Description

Condition:

"foo\nbar" == "foo
bar"

Evaluates to true in /tools, /test, and /examine pages, but if saved as a filter, it matches no edit.

On the contrary, condition:

"foo\r\nbar" == "foo
bar"

Evaluates to false in /tools, /test, and /examine pages, but if saved as a filter, it matches all edits.

This has been observed in both Linux + Firefox and Windows + Edge (Chromium).

Original report:

Steps to replicate the issue (include links if applicable):

What happens?:
The filter recognizes an edit on the examine page but not when saving that edit.

What should have happened instead?:
The behavior for examine and during save should be the same. (The intention here was that the edit should be recognized by the filter).

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia): plwikiquote

Other information (browser name/version, screenshots, etc.):

Event Timeline

Two possibilities I can see. First, the text was pre-saved-transformed, in some way that caused only new_wikitext_pst to match and not new_wikitext. Unfortunately, when examining past edits, new_wikitext contains the value of new_wikitext_pst and there's not much that can be done to fix that; the original, un-transformed text is discarded on save.

Alternatively, this is a caching issue. From https://pl.wikiquote.org/w/api.php?action=query&list=logevents you updated the filter at 2024-04-27T19:55:16Z and made that at edit at 2024-04-27T19:55:51Z. Maybe your edit was tested against the old version of the filter. Can you try again? If that doesn't work, try using new_wikitext_pst.

I've just copied the content of Użytkownik:Swam_pl/brundopis3 to Użytkownik:Msz2001/brudnopis (examine) and the result is the same: the edit is recognized in an examine mode but not while saving.

I'll ask Swam pl, who made the filter to use new_wikitext_pst just for test and we will see...

If that doesn't work, try using new_wikitext_pst.

We tried this approach, but actually, there's no variable new_wikitext_pst (documentation).

Sorry, I meant new_pst. But I've already tried that on testwiki, and it doesn't help. This is actually a problem with line endings and not related to T102944. Multiline strings are apparently interpreted differently when evaluating an edit vs. in /tools and /examine.

This evaluates to "true" in /tools, /test, and /examine:

"foo\nbar" == "foo
bar"

But put the same text in a filter, and it matches no edits.

This evaluates to "false" in /tools, /test, and /examine:

"foo\r\nbar" == "foo
bar"

But it matches all edits once saved in a filter.

This might depend on what browser or operating system you are using when visiting /tools. I am using Firefox on Linux.

I can replicate those on Windows 11 + Edge 124, so the result seems to be independent from the OS newline convetions. Tested /tools on plwiki and both /tools and real filter on a local installation of MW 1.40.

Msz2001 renamed this task from AbuseFilter doesn't hit while saving edit, recognizes when examined manually to Inconsistent treatment of multiline strings in AbuseFilter matching edits and diagnostic tools.Sun, Apr 28, 8:00 PM
Msz2001 updated the task description. (Show Details)

First, we need to decide which is preferred, i.e., whether the multiline string should be interpreted as with or without the carriage return, or depend on the platform. (I think it's without.)

I took a quick look at the code, and the tokenizer seems to have no special handling for multiline strings, it just reads whatever it finds between a pair of quotes. Nevertheless, there seems to be no test case for multiline strings: https://gerrit.wikimedia.org/g/mediawiki/extensions/AbuseFilter/+/master/tests/parserTests/string.t.

There are potentially many code paths to look at. AbuseFilterViewEdit and the other views reading the POST'd data, FilterStore/FilterLookup which are proxies to the database, parsing stuff, and all the caching layers involved.