Page MenuHomePhabricator

Create an 'ARTICLE_LASTXH_CONTRIBUTORS' function
Open, Needs TriagePublic5 Estimated Story Points

Description

Hi!
the arbcom.de has recurring cases, where there are two conflicting user groups. Their conflict usually has become somewhat personal over the years. Both groups gang up on each other leaving provocative remarks in a matter of minutes. The conflict usually also has been occupying our admins for years, has driven away some productive authors from the german language wikipedia and it prevents third party editors from improving the disputed page(s), as they normally try to not get involved in the conflict.

For this reason, we're opening this phabricator task to create the variable 'ARTICLE_LASTXH_CONTRIBUTORS' for the edit filter (aka abuse filter). With this variable, users (or user groups) could be defined, who cannot edit an article or discussion page for x hours after the other user (or user group) has edited it and vice versa. In effect, this would be used for a user (or user group) separation filter. This temporary mutual exclusion filter would leave both users (or user groups) some time to reflect, it would slow the escalation rate, probably permit third party editors to edit the article, and it would reduce the work load for the local administrators. Most importantly, it would also be a less invasive measure compared to a user block. The filter option could be combined with bans imposed by an arbcom or administrators as an additional tool.

If it helps, x could be any numeric value, but x could as well be limited to the 4 values 2h, 12h, 48h, or 168h.

Thank you very much and kind regards, ghilt

Event Timeline

Ghilt raised the priority of this task from to Needs Triage.
Ghilt updated the task description. (Show Details)
Ghilt added subscribers: Ghilt, Schiedsgericht, Seewolf and 2 others.

Removed subscriber Ghilt due to double subscription (also in arbcom)

Good initiative by our German colleagues. At the root of many personal conflicts is the need for some users to reply to everything again and again as fast as possible, causing unnecessary escalation. Wikipedia is not a forum. Making it easier for arbiters to kill discussion and other forms of interaction between particular users should prevent escalation of conflicts and make Wikipedia a more enjoyable working environment for everyone. Cheers, Woudloper (Dutch arbcom member)

Can anyone tell me what this ticket is stuck on or if it's on any roadmap? @matej_suchanek for lack of knowing who to ping

@Charlie_WMDE: I don't see any signs of being stuck here. I guess someone needs to contribute the required code, as for most open tasks?
Also see T185154 about AbuseFilter maintainership in general.

Huji set the point value for this task to 5.

I am seeing this now for the first time, and here are my assessments:

  • This should not be a variable, but rather a function, such as page_recent_contributors(x) where x is time in hours. The output would be an array, and you can check to see if any of the users you are interested in are in that array.
    • One technical consideration here is that we are getting a list of the latest contributors to a page given some time constraint. That would translate to running a query on the revision table with a WHERE clause on the rev_timestamp field, which is already indexed.
    • Another technical consideration is that you would want to compare two arrays (one that contains the list of last contributors, another that contains a list of users); currently operators like contains_any only allow for comparing a string and an array, not an array and another array. So we need another function like overlap(a, b) which would return a list of all items that are mutually available in arrays a and b, and false if there is no overlap.
  • It would be a good idea to specify a restriction to the range of x as well. For instance, we should validate it is an integer between 1 and 24.

So from a development point of view, I think it is not a huge lift. I am assigning a story point of five, and adding Anti-Harassment so it could be potentially included in their next sprint.

The final rule would look like this pseudocode:

group_one = ['UserA', 'UserB'];
group_two = ['UserX', 'UserY'];

(
  username in group_one
  &
  group_two contains page_recent_contributors(3)
)
|
(

  username in group_two
  &
  group_one contains page_recent_contributors(3)
)

From a practical point of view, I don't know how useful it is. The two groups of DEWP users can always create new (or sleeper) accounts to evade a filter that uses this new function, so a combination of measures may need to be put in place to address those evasions. But I think it is best that we develop this feature and let the communities decide how to use it and how useful it is.

Daimona renamed this task from Create variable 'ARTICLE_LASTXH_CONTRIBUTORS' for the edit filter to Create an 'ARTICLE_LASTXH_CONTRIBUTORS' function.Apr 6 2018, 5:21 PM

This sounds like a very expensive way to run filters, how would it scale? Can someone show a mockup filter for when there are an example 10 different pairs of these restrictions? Or would you implement a separate filter for each group? This also doesn't seem to address discussion type pages - perhaps more filters for that?

The idea in T125723#3994140 seems like a solution, but does not address evasive actions. A simple workaround is to enforce use of registered and identified accounts, but that is a bit counterproductive as it seriously limit who can contribute to an article.

For many purposes it could be possible to track changing user accounts either through cookies or some other means, and replace the actual user names with some alternate identificator. If there are signs that a user comes from a conflicting group responses can be slowed down.

If someone wants to harass a specific user then the harasser would probably want the user to know how and why, and then do it as her-/himself. Perhaps it isn't necessary to put to much work into tracking alternate user accounts.

Added note: Use of an alternate identificator could pose a privacy problem. One solution could be to add random delays to create plausible deniability, but I'm not sure other users will accept this.

On second thought, the idea in T125723#3994140 will open for programming of sloppy rules that are awfully slow. The reason is that the function call page_recent_contributors() is slow as it will always trigger a hit on the database. It would be better to create a function has_conflicting_contributors(time, user, conflicting_users…) as that enforces an early check whether it is necessary to check the database. The ellipsis is to show where the other conflicting contributors go.

The usermust be in the set of conflicting_users, the set is filtered down to remove user, and that set is used in a query against the database together with a time limit. It is not necessary to check more than one user, that is the current one, but it is necessary to check that user against the filtered set of other users. (There is an index page_user_timestamp.)

There could be several sets of conflicting contributors, but it is not clear whether it is necessary (or important) to identify how the groups relates.