Page MenuHomePhabricator

AbuseFilter *_links variables incorrect on item edits.
Open, LowPublicBUG REPORT

Description

For each change that is submitted, AbuseFilter provides variables that can be used to write filters, typically allowing a comparison of the current revision with the one to be submitted. For example the size of the page (before and after the edit) can be found in the variables old_size and new_size.

For edits to Wikibase entities the AbuseFilter variables added_links (removed_links) don't seem to be properly populated, they are always empty. They should contain a list of the external links that were added (removed) in the edit to be submitted.

Also the variable all_links seems to always equal old_links, while it should in fact contain all links in the revision to be submitted.

This is Wikibase specific, as it works with Wikitext pages.

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:21 AM
bzimport set Reference to bz63632.
bzimport added a subscriber: Unknown Object (MLST).

This can be fixed by either making Wikibase hook into AbuseFilter's "AbuseFilter-interceptVariable" hook (that might get messy). Or by doing some more work on ContentHandler support in AbuseFilter (which is also going to be a pain, given the way AbuseFilter "works").

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher lowered the priority of this task from High to Low.Apr 23 2017, 2:53 PM

I'm sorry, I'm pretty ignorant about wikidata. How exactly am I supposed to reproduce the problem? I tried here, but I'm unsure if that's right. If it is, then indeed we don't get _links variables.

Update: well, TBH I see some checks for ContentModel === Wikitext in the code used to retrieve links...

Examples can still be seen in this abuse log. I think T242249 has brought this closer to fixing. T264104 is related.

Change 819484 had a related patch set uploaded (by Matěj Suchánek; author: Matěj Suchánek):

[mediawiki/extensions/AbuseFilter@master] Retrieve external links from PreparedUpdate

https://gerrit.wikimedia.org/r/819484

matej_suchanek changed the task status from Open to In Progress.Jan 14 2023, 9:32 AM
matej_suchanek claimed this task.
matej_suchanek changed the subtype of this task from "Task" to "Bug Report".

Change 819484 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Retrieve external links from PreparedUpdate

https://gerrit.wikimedia.org/r/819484

matej_suchanek changed the task status from In Progress to Open.Jun 13 2023, 2:14 PM
matej_suchanek removed matej_suchanek as the assignee of this task.

Fixed for live edits but not for the "examine past edits" feature.
This should not impact Wikidata since the only four filters using these variables check non-item pages.

Change 929657 had a related patch set uploaded (by Ladsgroup; author: Matěj Suchánek):

[mediawiki/extensions/AbuseFilter@wmf/1.41.0-wmf.13] Retrieve external links from PreparedUpdate

https://gerrit.wikimedia.org/r/929657

Change 929657 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@wmf/1.41.0-wmf.13] Retrieve external links from PreparedUpdate

https://gerrit.wikimedia.org/r/929657

Mentioned in SAL (#wikimedia-operations) [2023-06-13T18:32:36Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:929657|Retrieve external links from PreparedUpdate (T65632 T264104)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-13T18:34:10Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:929657|Retrieve external links from PreparedUpdate (T65632 T264104)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-13T18:44:55Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:929657|Retrieve external links from PreparedUpdate (T65632 T264104)]] (duration: 12m 18s)

Fixed for live edits but not for the "examine past edits" feature.

I'm not sure it's fixed. Maybe I'm missing something, https://test.wikidata.org/wiki/Special:AbuseFilter/19 is not applied on https://test.wikidata.org/w/index.php?title=Q45948&action=history

In my test environment, it logs:

my_wiki: Loading all_links from DB
my_wiki: Loading old_links from DB

This is wrong, and it also means the PreparedUpdate is not available because Wikimedia\Assert\PreconditionException is thrown where we catch it and fall back to the legacy code. But it does work for regular wikitext pages.

I suspect MediawikiEditFilterHookRunner::getContextForEditFilter does not carry over the "original" WikiPage instance (used as the "carrier" of the PreparedUpdate).

I suspect MediawikiEditFilterHookRunner::getContextForEditFilter does not carry over the "original" WikiPage instance (used as the "carrier" of the PreparedUpdate).

That could explain why we have so much duplicate parses in production. Wanna take a look? I can take a look a bit later today.

I suspect MediawikiEditFilterHookRunner::getContextForEditFilter does not carry over the "original" WikiPage instance (used as the "carrier" of the PreparedUpdate).

This seems correct: instantiating a DerivativeContext does not set the wikipage field, which is lazy-loaded in getWikipage(), and created from scratch if a Title is available. I'd be a bit scared changing this in any way though (e.g., persisting the WikiPage object), because of the potential side effects it could have. OTOH, I don't think there's a task for adding PreparedUpdate to the hook parameters. Should we just do that instead? I think that would mean creating a new hook; it might be a good occasion to address some of the issues reported in T304238.

Change 930939 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/Wikibase@master] EditEntity: Pass around WikiPage in MediawikiEditFilterHookRunner

https://gerrit.wikimedia.org/r/930939

That didn't work in mwdebug and from what I'm seeing we need to set WikiPage in the context somewhere being passed around from ModifyEntity -> EntitySavingHelper -> MediaWikiEditEntityFactory -> MediaWikiEditEntity -> MediaWikiEditFilterHookRunner. But can't figure out where.

Change 930939 abandoned by Ladsgroup:

[mediawiki/extensions/Wikibase@master] EditEntity: Pass around WikiPage in MediawikiEditFilterHookRunner

Reason:

https://gerrit.wikimedia.org/r/930939

Change 932225 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] EXPERIMENT: Introduce EditFilterOutcomeHook

https://gerrit.wikimedia.org/r/932225

The problem here is that Wikibase calls the EditFilterMergedContent hook before preparing the actual update, thus the WikiPage wont have a PageUpdater available (besides the WikiPage not being passed on to the new context).

The following, extremely ugly, PoC patch works for me:

Note: Given this moves the edit filters to be run much later, right before the edit happens, this might cause a regression similar to T232620: Ids are assigned but the entity wont be created.

The above patch wont help solving in case of inspecting past edits, though… this will only help for actual submissions (and probably not all of them, we would need to audit all places using MediaWikiEditFilterHookRunner).

In T65632#9008027, @hoo wrote:

Note: Given this moves the edit filters to be run much later, right before the edit happens, this might cause a regression similar to T232620: Ids are assigned but the entity wont be created.

See also T264451: Entity ID should not be assigned if disallowed by AbuseFilter