Page MenuHomePhabricator

Some filter hits have old_wikitext === new_wikitext
Closed, ResolvedPublic

Description

Create this filter, on an active wiki:

action == "edit" &
page_id !== 0 &
length(added_lines + removed_lines) == 0

The actual result was 100 hits in only 5 hours.

The expected result is no matches; null edits shouldn't even be tested against any filter.

Saving a null edit in the 2017 wikitext editor causes this, reliably. But most of those hits can't be explained by that; logged out editors don't even have that option available. Null edits in the old wikitext editor, VisualEditor, or the mobile editor don't seem to cause this.

Some other hits might be explained by two editors saving the same content at almost exactly the same time; for example we have multiple bots that fix double-redirects.

But whatever the cause, getEditVars() should have bailed out before anything was tested against the filter.

Event Timeline

null edits shouldn't even be tested against any filter.

Usually, they aren't. The only exception being when we'd need a pre-save-transform to actually see that there's no difference with the old text. (See https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/443854/)

Saving a null edit in the 2017 wikitext editor causes this, reliably

This is not true for me locally. I've had some troubles in setting up VE, but managed to get the 2017 editor working, and your filter doesn't match my null edits.

Also, I have no clue why this is happening.

Unable to reproduce, tested that at test.wikipedia, after importing Jayakumar Jitendrasinh Rawal from enwiki with all templates, modules etc it uses.

And unable to reproduce on itwiki either, see https://it.wikipedia.org/wiki/Speciale:FiltroAntiAbusi/467. No hits in 20 minutes. Seems something enwiki-specific.

Daimona triaged this task as Unbreak Now! priority.Dec 8 2019, 7:04 PM

I just realized that almost all of those edits were saved, and _lines variables are just unavailable. This may be a serious bug, hence setting UBN until we can know exactly what is causing it, how it can be fixed, and what its consequences are. However, I won't be able to do that before tomorrow morning my time.

Of note, the itwiki filter has matched once.

I just realized that almost all of those edits were saved, and _lines variables are just unavailable.

I'm not sure. I think it's just linking to the wrong diff. My edit on testwiki was not saved; I got the expected behavior (apart from the filter log) for a null edit.

I just realized that almost all of those edits were saved, and _lines variables are just unavailable.

I'm not sure. I think it's just linking to the wrong diff.

Hm, you're right. So it would be similar to T198651. I still cannot reproduce on testwiki, on your test page. Could you please write down detailed reproduction steps?

User agent: Mozilla/5.0 (X11; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0

(1) Create a new account
(2) On testwiki, in your preferences, select "New wikitext mode". Do not change anything else.
(3) Go to your sandbox, and click "Edit source"
(4) Add the following

foo
bar

(5) Publish the page
(6) Open the editor again
(7) Change the first line from "foo" to "fooo"
(8) Change the first line from "fooo" back to "foo" (steps 7 and 8 are needed to make the Publish button clickable)
(9) Publish the page

This does not work if the page has only one line instead of two. This, combined with the fact that I'm using Linux, suggests it's a line-ending issue (CRLF vs LF).

From my account, it doesn't work.

I've then created a clean account, followed your steps exactly (at least five times), but still couldn't trigger the filter.

At this point, I wonder whether we should just agree on a date & time and test this on testwiki, along the lines of:

Having at least an idea of what's going wrong could help me to reproduce the bug locally.

Could varnish be doing something with the line endings? I'm in the US, so we are probably connecting to different datacenters.

Could varnish be doing something with the line endings? I'm in the US, so we are probably connecting to different datacenters.

I'm out of ideas for now... I wouldn't even be sure about line endings. Given that this is a "simple" null edit (i.e. no PST is needed), we shouldn't even be filtering it (code). We need to understand why the content are deemed different. It could also have to do with MCR or the like, although in that case I'd expect to be able to reproduce the bug easily.

Our best bet is (probably) live testing as I said above. Unless you have a local wiki and you're able to reproduce the issue there, in which case I could tell you what to do there.

Haven't tried to set up the 2017 editor locally. That means installing VE first, yes?

On testwiki, I can trigger it with the API, using CRLFs:

(new mw.Api()).post({
	"action": "edit",
	"format": "json",
	"title": "User:Suffusion of Yellow/sandbox",
	"token": /* removed */,
	"text": "foo\r\nbar"
});

or

(new mw.Api()).post({
	"action": "visualeditoredit",
	"format": "json",
  "veaction": "save",
	"page": "User:Suffusion of Yellow/sandbox",
	"token": /* removed */,
	"wikitext": "foo\r\nbar"
});

Replacing foo\r\nbar with foo\nbar doesn't trigger this.

Doesn't prove that's what's going on with the 2017 editor of course. FWIW the HAR log from the Firefox network monitor, shows \r\ns in every multipart/form-data; request, including the visualeditoredit requests from the 2017 editor.

I can reproduce with Firefox on Linux from the US (User agent: "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0", Firefox on Android ("Mozilla/5.0 (Android 10; Mobile; rv:68.0) Gecko/68.0 Firefox/68.0"), but not Chrome on Linux ("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"2), or Chrome on Android ("Mozilla/5.0 (Linux; Android 10; Pixel 3a) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Mobile Safari/537.36"). Seems like a Firefox bug.

Haven't tried to set up the 2017 editor locally. That means installing VE first, yes?

Yes, but you don't need parsoid for that. I didn't want to install the parsoid server, but managed to get the 2017 editor working locally by following these instructions.

On testwiki, I can trigger it with the API, using CRLFs:

YAY! I managed to reproduce it by using your first snippet! Later, I'll replicate the setup on my local wiki and see if I can replicate it there as well.

Now it's also pretty clear that it could be linefeed-related. I thought we had fixed this back in rEABFef489d7ab5872d36cfc0f2a04214b5057f4ce167, but maybe the fix wasn't complete. Again, I'll let you know the results of my local investigation.

I can reproduce with Firefox [...] but not Chrome [...] Seems like a Firefox bug.

Ah, this would explain why I couldn't reproduce the issue -- I use chrome.

Ftr, I managed to reproduce&testwiki as well.

Aye, I've been able to reproduce it locally via API. I still have to determine why this issue is happening, but aside from the line endings, it's now pretty clear that it's now 2017WE's fault. I mean, it's probably messing with newlines, but not the root cause.

I finally managed to understand what's going on. First of all, yes, it has to do with line endings. The 2017 editor is probably just a way to reliably inject carriage returns, but whatever adds them will work.

Now, take a look at this code: what it does is:

  • If we have a Content object for the previous revision, compare the Content objects with their native methods
  • If we don't have a Content object, compare the strings.

As I've already said, we do normalize line endings when retrieving the content of the edit, but we only do that on the stringified content. Hence, if:

  • We have a Content object for the prev revision
  • The new revision has an extra carriage return

We compare the Contents (which are different) and don't even worry about the stringified texts. Hence, we think that this is not a null edit.

The best thing to do is probably collapse the if branches. Aside from that, it'd be cool if we could stop misattributing AbuseLog entries...

Change 556208 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Strengthen the check for null edits

https://gerrit.wikimedia.org/r/556208

Aside from that, it'd be cool if we could stop misattributing AbuseLog entries...

Done as well, in the same patch.

it's now pretty clear that it's no[t] 2017WE's fault. I mean, it's probably messing with newlines, but not the root cause.

2017WTE injects a terminal CR into the content to replicate what EditPage.php does in the textarea ([[https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/includes/EditPage.php#4554|addNewLineAtEnd]])… but there's some magic code for stripping that terminal CR in EditPage.php (I think) which plausibly doesn't get run on API edits, because MediaWiki sucks. Nice find.

Change 556208 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Strengthen the check for null edits

https://gerrit.wikimedia.org/r/556208

Daimona removed a project: Patch-For-Review.

it's now pretty clear that it's no[t] 2017WE's fault. I mean, it's probably messing with newlines, but not the root cause.

2017WTE injects a terminal CR into the content to replicate what EditPage.php does in the textarea ([[https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/includes/EditPage.php#4554|addNewLineAtEnd]])…

Hah, and this explains the 2017WTE part.

but there's some magic code for stripping that terminal CR in EditPage.php (I think) which plausibly doesn't get run on API edits, because MediaWiki sucks. Nice find.

Indeed, this seems to be the case.

Calling this resolved, thanks everyone!

Certainly seems resolved on enwiki. I re-enabled the filter in the task, and there were no hits in about a day. Thanks!

Certainly seems resolved on enwiki. I re-enabled the filter in the task, and there were no hits in about a day. Thanks!

Great! I should note here as well, that my change took away the capability to filter content model changes (T240485). However, I don't think that's a widely-used feature.