Page MenuHomePhabricator

Wikidata edits are not tagged by AbuseFilter on item creations
Open, Needs TriagePublic

Event Timeline

matej_suchanek raised the priority of this task from to Needs Triage.
matej_suchanek updated the task description. (Show Details)
Restricted Application added a project: Wikidata. · View Herald TranscriptJul 17 2015, 12:46 PM
Restricted Application added subscribers: Luke081515, Aklapper. · View Herald Transcript

Any update on this?

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptJan 18 2016, 9:08 AM
hoo added a comment.Jan 18 2016, 9:46 AM

I will have a look at this today or in the next couple of days, but I can't promise anything. AbuseFilter is extremely hard to modify messy legacy code, thus if this is not fixable in Wikibase, it could be to hard to reasonably fix this.

hoo added a comment.EditedJan 19 2016, 10:44 PM

So I looked into this and found the root cause.

When Wikibase calls EditFilterMergedContent it uses NewItem (or Item:NewItem, depending on the namespace items are in) as title, because we don't have the Item Id at that point (because we don't want to assign Ids we never use, as edits might still fail).

When AbuseFilter chooses to tag an edit it does so by setting AbuseFilter::$tagsToSet[$actionID] = array( /* TAGS */ ); where $actionID is TITLE-USER-edit. In AbuseFilterHooks::onRecentChangeSave that information is then used to save the change tags earlier assigned to that edit. That works by reproducing the above mentioned $actionID from the information present in the recent change entry.

Obviously that doesn't work for the Wikibase changes as the change tags are initially stored with the placeholder title, but the rc entries use the actual title.

hoo set Security to None.Jan 19 2016, 10:47 PM
hoo added subscribers: daniel, Lydia_Pintscher.
hoo added a comment.EditedJan 20 2016, 10:46 PM

How can I help?

Well, this needs to be triaged and the only (acceptable, not totally messy) way to fix this is by assigning actual ids earlier in the Entity creation process (which would lead to us "wasting" more entity ids, making the numbers less of a counter). I would like to hear your opinion on that.

Please note that this is a more general question and doesn't just apply to AbuseFilter (which we can't fully support anyway, given the state it is in). Other extensions with similar mechanisms will face the same problems.

Daimona moved this task from Backlog to Internal bugs on the AbuseFilter board.Apr 24 2018, 4:46 PM

So I looked into this and found the root cause.
When Wikibase calls EditFilterMergedContent it uses NewItem (or Item:NewItem, depending on the namespace items are in) as title, because we don't have the Item Id at that point (because we don't want to assign Ids we never use, as edits might still fail).
When AbuseFilter chooses to tag an edit it does so by setting AbuseFilter::$tagsToSet[$actionID] = array( /* TAGS */ ); where $actionID is TITLE-USER-edit. In AbuseFilterHooks::onRecentChangeSave that information is then used to save the change tags earlier assigned to that edit. That works by reproducing the above mentioned $actionID from the information present in the recent change entry.
Obviously that doesn't work for the Wikibase changes as the change tags are initially stored with the placeholder title, but the rc entries use the actual title.

So it looks like the title is probably only important for when used with stashedit?

Anyway:

		$actionID = implode( '-', [
			$title->getPrefixedText(), $recentChange->getAttribute( 'rc_user_text' ), $action
		] );

It would be cool if this could ust use a hash of the abusefilter vars? or the text of the change or something like that?

The system used for applying tags is not ideal (relies on global state etc.), but I'm unsure what an alternative could be, since tags cannot be applied when filters are executed (per design in core). To keep the process simple, we use an identifier which can be constructed with data available both when running filters (and we have plenty of data at that point), AND when saving the recent change. However, for the latter we only have a RecentChange object passed in by the hook, and not much info we can extract from there aside from user, action and title.