Page MenuHomePhabricator

AbuseFilter MCR diff is comparing old value of one slot with the new value from another, not the old whole page with the new whole page
Closed, ResolvedPublic

Description

Discovered during SDC deployment.
E.g.: https://commons.wikimedia.org/wiki/Special:AbuseLog/4475237

Event Timeline

Daimona triaged this task as Unbreak Now! priority.EditedJan 10 2019, 4:46 PM
Daimona subscribed.

Diffs are computed basing on old_wikitext and new_wikitext, which are the only base variables used to compute edit_delta, added_lines, edit_diff, new_size etc. This means that filters are very likely to explode.

Note that these variables, are defined here on edits. Will investigate what's wrong there. In case you want to join (and possibly help with MCR), you can reach out to me on IRC (Daimona)

I see that the wrong one is old_wikitext, while new_wikitext is correct (although it contains 'M60452106' and I'm unsure if it should).

Yeah, old_wikitext and new_wikitext and not MCR-ready, despite what https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/master/includes/AbuseFilter.php#3162 says. old_wikitext is set via:

$oldcontent = $revision->getContent( Revision::RAW );
$oldtext = AbuseFilter::contentToString( $oldcontent );

Revision->getContent only fetches the MAIN slot, it's not MCR-ready.

new_wikitext is correct

I don't think it is; AF is written with assumptions that are based on old_revision<->new_revision comparisons, but what is shown there is just the change on the slot(s) changed, not the resulting page following the edit.

(although it contains 'M60452106' and I'm unsure if it should).

It's sub-optimal, but it only occurs on the first time a File page has a caption/statement added.

@Jdforrester-WMF So getContent should be replaced. Is there something which allows getting the content for a certain slot from a revision, and a way to extract the current slot from the hook params?

new_wikitext is correct

I don't think it is; AF is written with assumptions that are based on old_revision<->new_revision comparisons, but what is shown there is just the change on the slot(s) changed, not the resulting page following the edit.

Ah... Still better than a totally different text though.

It's sub-optimal, but it only occurs on the first time a File page has a caption/statement added.

Definitely not a big deal.

So the issue is inboth:

  • AbuseFilterHooks::onEditFilterMergedContent - which gets $text only for the content of the slot that is actually being changed
  • AbuseFilterHooks::filterEdit - which then only gets the main slot content for the previous revision

So the content on the old side will always only be the main slot and the content on the new side will always only be the changed thing.

Getting the whole old revision in the form expected is easy, AbuseFilter::revisionToString
We need to get from the Content / string text to the new revision though. So we need to figure out what slot has been edited so that we can make a "pretend" whole revision to use for extracting the whole text, or just have that whole revision passed all the way down, instead of just the content / text.

Looks like in rEABF688eccea477a: Expose text from all slots to AbuseFilter @daniel fixed loading of old_wikitext and new_wikitext for the code path that goes via AFComputedVariable, e.g. for AbuseFilter::getEditVarsFromRCRow(). But that change overlooked where they get set in AbuseFilterHooks::newVariableHolderForEdit(), which is probably being called here via the 'EditFilterMergedContent' hook,[1] which doesn't seem to even get passed the slot.

[1]: Probably from Wikibase\Repo\EditEntity\MediawikiEditFilterHookRunner

So currently there's no way to retrieve the slot being edited via parameters passed to the EditFilterMergedContent hook, is that right?

Getting the whole old revision in the form expected is easy, AbuseFilter::revisionToString
We need to get from the Content / string text to the new revision though. So we need to figure out what slot has been edited so that we can make a "pretend" whole revision to use for extracting the whole text, or just have that whole revision passed all the way down, instead of just the content / text.

Just always use the whole revision, the real one. It's much harder to get stuff to work on individual slots.

Just always use the whole revision, the real one. It's much harder to get stuff to work on individual slots.

There isn't a saved revision yet.
There isn't even a complete revision object yet.
This code is called in MediawikiEditEntity in Wikibase. All it knows about is the new entity to be saved, and shortly after it saves it into the entity store.

Change 483471 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] XXX: Throw a slot role into EditFilterMergedContent

https://gerrit.wikimedia.org/r/483471

Change 483472 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/AbuseFilter@master] XXX: AF onEditFilterMergedContent for MCR & Wikibase

https://gerrit.wikimedia.org/r/483472

Those 2 patches show roughly the evilness I was thinking about in T213453#4870354. They are here to get some more eyes and thoughts (untested right now).
We will have to fix this properly in the near future, but this hack may make sodc stay enabled for now?
Of course there are other options than trying to rush a hack like this in.

My "purist" view is that we should replace the EditFilterMergedContent hook for AbuseFilter with something from PageUpdater that emits a full picture of the world (with all the slots of the new to-be Revision filled in, etc.) populated by EditFilterMergedContent calls from ApiEditPage, Wikibase, or wherever.

Those 2 patches show roughly the evilness I was thinking about in T213453#4870354. They are here to get some more eyes and thoughts (untested right now).

They seem like hacks that'll fix things until stuff is rewritten to have more-MCR-oriented hooks. Probably the core calls to the hook should be updated to pass SlotRecord::MAIN, instead of having a Wikibase-only parameter.

My "purist" view is that we should replace the EditFilterMergedContent hook for AbuseFilter with something from PageUpdater that emits a full picture of the world (with all the slots of the new to-be Revision filled in, etc.) populated by EditFilterMergedContent calls from ApiEditPage, Wikibase, or wherever.

I don't see any existing hook like that, the only other hook seems to be 'PageContentSave' which has the same issues. So something would have to be added. I don't know if there's a reason things use EditFilterMergedContent instead of already using PageContentSave.

Also, BTW, I still think PageUpdater should be replaced with other pieces as described in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/460436/6//COMMIT_MSG.

Those 2 patches show roughly the evilness I was thinking about in T213453#4870354. They are here to get some more eyes and thoughts (untested right now).

They seem like hacks that'll fix things until stuff is rewritten to have more-MCR-oriented hooks. Probably the core calls to the hook should be updated to pass SlotRecord::MAIN, instead of having a Wikibase-only parameter.

I'll clean up these once I have eaten and we can look at backporting them.

Change 483472 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Use slot in onEditFilterMergedContent

https://gerrit.wikimedia.org/r/483472

Change 483528 had a related patch set uploaded (by Jforrester; owner: Addshore):
[mediawiki/extensions/AbuseFilter@wmf/1.33.0-wmf.12] Use slot in onEditFilterMergedContent

https://gerrit.wikimedia.org/r/483528

Change 483603 had a related patch set uploaded (by Jforrester; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.33.0-wmf.12] Pass slotrole into EditFilterMergedContent hook

https://gerrit.wikimedia.org/r/483603

Change 483603 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.33.0-wmf.12] Pass slotrole into EditFilterMergedContent hook

https://gerrit.wikimedia.org/r/483603

Change 483528 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@wmf/1.33.0-wmf.12] Use slot in onEditFilterMergedContent

https://gerrit.wikimedia.org/r/483528

Mentioned in SAL (#wikimedia-operations) [2019-01-10T22:57:42Z] <jforrester@deploy1001> Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditFilterHookRunner.php: T213453: Pass slotrole into EditFilterMergedContent hook in Wikibase repo (duration: 00m 47s)

Change 483629 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/AbuseFilter@master] Pass MCR AF text into newVariableHolderForEdit

https://gerrit.wikimedia.org/r/483629

Change 483635 had a related patch set uploaded (by Jforrester; owner: Addshore):
[mediawiki/extensions/AbuseFilter@wmf/1.33.0-wmf.12] Pass MCR AF text into newVariableHolderForEdit

https://gerrit.wikimedia.org/r/483635

Change 483471 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Pass slotrole into EditFilterMergedContent hook

https://gerrit.wikimedia.org/r/483471

Change 483635 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@wmf/1.33.0-wmf.12] Pass MCR AF text into newVariableHolderForEdit

https://gerrit.wikimedia.org/r/483635

Change 483629 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Pass MCR AF text into newVariableHolderForEdit

https://gerrit.wikimedia.org/r/483629

Mentioned in SAL (#wikimedia-operations) [2019-01-11T00:18:34Z] <jforrester@deploy1001> Synchronized php-1.33.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: T213453: Use slot in onEditFilterMergedContent and newVariableHolderForEdit in AbuseFilter (duration: 00m 47s)

Jdforrester-WMF assigned this task to Addshore.
Jdforrester-WMF removed a project: Patch-For-Review.

With massive thanks to Addshore, this is now done in production.

Huh, I lost my train... @Addshore is the change stable, and are we fine with it for now?

Huh, I lost my train... @Addshore is the change stable, and are we fine with it for now?

Yes. We should create a Tech Debt ticket to re-do this, blocked by the MCR re-write of EditPage/PageUpdater/whatever. Addshore, do you know what that task would look like?

@Jdforrester-WMF I'd create the task myself, although I'm not into MCR so I don't really know what's left to do here. Should the EditFilterMergedContent always pass a slot, or what else? Ping @Addshore.