Page MenuHomePhabricator

Wikidata edits are not tagged by AbuseFilter on item creations
Open, Needs TriagePublic

Event Timeline

matej_suchanek raised the priority of this task from to Needs Triage.
matej_suchanek updated the task description. (Show Details)
Restricted Application added subscribers: Luke081515, Aklapper. · View Herald Transcript

I will have a look at this today or in the next couple of days, but I can't promise anything. AbuseFilter is extremely hard to modify messy legacy code, thus if this is not fixable in Wikibase, it could be to hard to reasonably fix this.

So I looked into this and found the root cause.

When Wikibase calls EditFilterMergedContent it uses NewItem (or Item:NewItem, depending on the namespace items are in) as title, because we don't have the Item Id at that point (because we don't want to assign Ids we never use, as edits might still fail).

When AbuseFilter chooses to tag an edit it does so by setting AbuseFilter::$tagsToSet[$actionID] = array( /* TAGS */ ); where $actionID is TITLE-USER-edit. In AbuseFilterHooks::onRecentChangeSave that information is then used to save the change tags earlier assigned to that edit. That works by reproducing the above mentioned $actionID from the information present in the recent change entry.

Obviously that doesn't work for the Wikibase changes as the change tags are initially stored with the placeholder title, but the rc entries use the actual title.

How can I help?

Well, this needs to be triaged and the only (acceptable, not totally messy) way to fix this is by assigning actual ids earlier in the Entity creation process (which would lead to us "wasting" more entity ids, making the numbers less of a counter). I would like to hear your opinion on that.

Please note that this is a more general question and doesn't just apply to AbuseFilter (which we can't fully support anyway, given the state it is in). Other extensions with similar mechanisms will face the same problems.

So I looked into this and found the root cause.

When Wikibase calls EditFilterMergedContent it uses NewItem (or Item:NewItem, depending on the namespace items are in) as title, because we don't have the Item Id at that point (because we don't want to assign Ids we never use, as edits might still fail).

When AbuseFilter chooses to tag an edit it does so by setting AbuseFilter::$tagsToSet[$actionID] = array( /* TAGS */ ); where $actionID is TITLE-USER-edit. In AbuseFilterHooks::onRecentChangeSave that information is then used to save the change tags earlier assigned to that edit. That works by reproducing the above mentioned $actionID from the information present in the recent change entry.

Obviously that doesn't work for the Wikibase changes as the change tags are initially stored with the placeholder title, but the rc entries use the actual title.

So it looks like the title is probably only important for when used with stashedit?

Anyway:

		$actionID = implode( '-', [
			$title->getPrefixedText(), $recentChange->getAttribute( 'rc_user_text' ), $action
		] );

It would be cool if this could ust use a hash of the abusefilter vars? or the text of the change or something like that?

The system used for applying tags is not ideal (relies on global state etc.), but I'm unsure what an alternative could be, since tags cannot be applied when filters are executed (per design in core). To keep the process simple, we use an identifier which can be constructed with data available both when running filters (and we have plenty of data at that point), AND when saving the recent change. However, for the latter we only have a RecentChange object passed in by the hook, and not much info we can extract from there aside from user, action and title.