Page MenuHomePhabricator

CopyPatrol: Whitelist for privileged users
Closed, ResolvedPublic5 Estimated Story Points

Description

Huggle, STiki and other tools do the same, and I think we should too.

Amendment: I think we should have an on-wiki page that is fully-protected that lists sites and editors that should be whitelisted. We can format the page however we want and use API:Links to get the editor names and external links on the page. The results could be cached in Redis so we aren't parsing the page every time some one uses the tool. This on-wiki whitelist allows admins and interface editors to keep it up-to-date. Between us and Diannaa I think we'll quickly see better results in CopyPatrol.

Note there is already a blacklist for excluding certain sites from being added to the EranBot database https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist

Event Timeline

We should ask Diana about this one. She might have experience on how often the privileged users happen to make copyvio edits.

Some major copyvio incidents have happened with users having such permissions, at least on enwiki. So I'd say the risk is fairly real.

Having this option is arguable, and revisiting the idea I mostly agree we don't want to auto-exclude these edits. The idea sprung up because repeatedly with CopyPatrol I've seen users with tens of thousands of edits and when I check it, it's either one of those false positives (no match at all), or a backwards copy (Wikipedia content on external site). A checkbox, defaulted off may still be useful.

I'll also admit that Huggle and STiki are not good tools to model after in this sense as they pertain to vandalism and we see very few experienced users inserting blatantly inappropriate material =P

I propose we show tags for the privileges associated with the editor (similar UI to Wikiproject tags but smaller, possibly underneath the editor info) which will help assist the reviewer in a similar way that editcount does. But not auto-exclude.

I propose we show tags for the privileges associated with the editor (similar UI to Wikiproject tags but smaller, possibly underneath the editor info) which will help assist the reviewer in a similar way that editcount does. But not auto-exclude.

This could get messy, as some users have a lot of individual user rights, and some with annoyingly long names ("pending changes reviewer", "extended confirmed", "extended pages mover"). We wouldn't want to only show certain rights as that could be misleading that those are the only rights they hold.

The better comparison would be Special:NewPagesFeed (which I believe @kaldari worked on!). Here we can filter to new users only, which yields the more likely problematic pages. However even the admins are possibly creating pages on non-notable subjects, completely unsourced, etc, and this does happen. I think the risk is similar to that with filtering out privileged users on CopyPatrol. What it boils down to, however, is not all of us are comfortable with reverting copyrighted content inserted by an admin or experienced user, and then confronting them about it. This involves drama and a lot of involvement that your average patroller is going to avoid. So our privileged user filter will allow these folks to work on the easy ones and let the professionals (Diannaa) deal with the high-profile cases.

Here's some previous statements from an on-wiki conversation in April.

I had a conversation with Diannaa, and two of the items she brought up as being helpful were:

  • Whitelist trusted users who get a lot of false positives
  • Flagging people who have posted copyvio in the past

Doc James wrote:
"Agree flagging people who have had issues with copyvios in the past would be excellent.
We discussed whitelisting trusted users. I think if we only do this for people who have a lot of false positives that will be okay. We have had people who have made 40K edits before issues being found so edit count definately cannot be used. Doc James (talk · contribs · email) 15:15, 16 April 2016 (UTC)"

Diannaa:
"I had in mind a few specific trusted users who regularly get false positives, for various reasons: Charles Matthews, Rjensen, Gamaliel, Peter coxhead. Diannaa (talk) 01:18, 17 April 2016 (UTC)"

Doc James:
"Okay sounds reasonable. As long as we keep the bar for inclusion very high. Doc James (talk · contribs · email) 18:30, 18 April 2016 (UTC)"

So it sounds like she wanted a very specific whitelist for people who'd gotten a lot of false positives, not necessarily whitelisting a class of editors.

So sounds like we might want to go with a whitelist, then? The only issue is we'll have to regularly maintain it. We could integrate this into the interface somehow so "superusers" of CopyPatrol can selectively whitelist editors, storing the whitelist in our database. From there Community Tech bot could auto-review edits from those whitelisted editors. This is a really cool idea, and would satisfy the concern that even admins can sometimes be guilty of introducing copyright violations. However it would be a fair amount of work, and we need to define who our "superusers" are and what the bar for being one of those is.

I've amended my proposal in the task description. I'm !voting for an on-wiki trusted editor whitelist.

I like the idea of having the whitelist on-wiki. People would be able to update it as needed, and there would be a clear history of who added the names.

It's not likely to be edited that much, so having a few people with the page on their watchlist would be good enough to make sure nobody messes with it.

It's not likely to be edited that much, so having a few people with the page on their watchlist would be good enough to make sure nobody messes with it.

Yeah, it looks like the site blacklist is only semi-protected and no one has messed with it over the past 1.5 years it has existed, so semi should be fine for the editor whitelist as well. I guess we should call this a blacklist too to be consistent, maybe at User:EranBot/Copyright/Editor_blacklist

DannyH triaged this task as Medium priority.Aug 9 2016, 5:33 PM
DannyH set the point value for this task to 5.
DannyH moved this task from Needs Discussion to Up Next on the Community-Tech board.
DannyH renamed this task from Filter out edits by privileged users to CopyPatrol: Whitelist for privileged users.Aug 9 2016, 8:51 PM

On the project talk page, @Doc_James asked if we could provide a link to the site whitelist:

https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist

We should check in about this -- how will people get to both the site and the user whitelist? Maybe links in the footer?

Pull request at: https://github.com/Niharika29/PlagiabotWeb/pull/25 and deployed to plagiabot. Caching is disabled on staging, so you can update the user whitelist at User:EranBot/Copyright/User_whitelist and the changes should be reflected immediately. On production it will cache for 2 hours, which perhaps is not aggressive enough... not sure.

I decided to make the whitelist wiki-specific so that when we support more wikis, they can control who's whitelisted locally.

Code-wise, I'm certain I'm doing something sub-par. Should we have a Dao just for Redis?

Left a note in Github. Would be good to have @Niharika review as well.