Page MenuHomePhabricator

Exclude bots (when filtering by Categories without Participants)
Open, Needs TriagePublic

Description

Insofar as possible without heroic efforts, we should exclude bots and contributions by bots from all metrics.

Background

Previously in Grant Metrics, Participant filtering was required. But with the changes brought about by enabling Category filtering without Participants (T208546 and T205734), the system is likely to pick up bot contributions. We understand that excluding all bots would be impossible. But we will do what we can to eliminate obvious bots, and then see how big an issue is presented by any remaining automated contributors.

Event Timeline

Clarification: Excluding bots, in this ticket, means excluding edits that have the bot flag on them.

Clarification: Excluding bots, in this ticket, means excluding edits that have the bot flag on them.

The concept of "bot edits" is only for recentchanges, right? https://www.mediawiki.org/wiki/Manual:Recentchanges_table#rc_bot

Isn't there a flag on the user?

Isn't there a flag on the user?

Yes (or a former flag). I thought you were talking about edits marked as bot edits, which is a more exacting way to do it, but of course we'd have to use recent changes which won't work in all cases. I think joining on user_groups / user_former_groups will probably be slow... but certainly worth a try :)

Note that some communities such as Spanish Wikipedia will sometimes hand out the bot flag to human accounts for high-rate editing, yet not all their edits are necessarily "bot" edits. Edge case for sure, just pointing out that there probably isn't a foolproof solution. Also there may be drive-by edits by patrollers, AWB users, etc., that are not really part of the event, per-se. I doubt it will skew the metrics too much. We'll find out!

Isn't there a flag on the user?

Yes (or a former flag). I thought you were talking about edits marked as bot edits, which is a more exacting way to do it, but of course we'd have to use recent changes which won't work in all cases. I think joining on user_groups / user_former_groups will probably be slow... but certainly worth a try :)

So, @Mooeypoo, what is the upshot here? Do we still have a relatively easy way to filter out bots? Or does the removal of that flag make this unfeasible, given our time constraints?