Page MenuHomePhabricator

Create the Commons deletion bot which will post messages on article talk pages
Closed, ResolvedPublic8 Story Points

Description

Acceptance criteria
  • When an image on Commons is marked for deletion and it is embedded in a different wiki's article page, a bot should post an announcement.
  • The bot should post a message on the talk page of the relevant article
    • The message copy is provided in this ticket, below
  • The bot should run continuously, but operate on a slight delay to not spam for accidental posts or vandalism.
    • There should be a 15 minute delay for speedy deletions
    • There should be a 60 minute delay for regular deletion discussions
  • Messages should be translatable via TWN — not templates
  • If a file is used on more than 10 pages, the bot should only post on 10 talk pages maximum
  • Should only identify files used on NS:0 (mainspace)
  • Should use User:Community Tech bot

Messages to post

For regular deletion:

== A Commons file used on this article has been nominated for deletion == 

The file [[commons:File:Foobar.jpg|Foobar.jpg]] on Wikimedia Commons has been nominated for deletion. View and participate in the deletion discussion at the [[commons:Commons:Deletion requests/File:Foobar.jpg|nomination page]]. — ~~~~

For speedy deletion:

== A Commons file used on this article has been nominated for speedy deletion == 

The file [[commons:File:Foobar.jpg|Foobar.jpg]] on Wikimedia Commons has been nominated for speedy deletion. View the deletion reason at the [[commons:File:Foobar.jpg|Commons file page]]. — ~~~~

Parameters what would need to change:

  • Filename
  • Date of nomination

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
TBolliger updated the task description. (Show Details)Feb 23 2018, 10:43 PM
MaxSem changed the task status from Stalled to Open.Feb 26 2018, 11:45 PM
MaxSem claimed this task.
MaxSem edited projects, added Community-Tech-Sprint; removed Community-Tech.

Moving back to the other board to be estimated.

TBolliger renamed this task from Create the Commons deletion bot to Create the Commons deletion bot which will post messages on article talk pages.Feb 28 2018, 12:11 AM
TBolliger updated the task description. (Show Details)Feb 28 2018, 12:22 AM
TBolliger edited projects, added Community-Tech-Sprint; removed Community-Tech.
TBolliger set the point value for this task to 8.

These questions are still open, so here are my answers:

The bot's username

We'll have to see what's available, but right now User:CommonsDeletionBot is available. I like this because the signature and name are specific and in the future it could spin off from being a CommTech project if a volunteer wants to take stewardship. Or we could use User:Community Tech bot if needed.

The bot should run on all wikis? opt-in? opt-out?

Let's deploy to one or two wikis to begin, as determined by who voted on the Wishlist. Once we've ironed out the kinks we can deploy to more wikis.

The bot should be marked as a global bot to avoid rate limiting?

This is probably not a real question — I don't think there is a concept of global bots on Wikimedia. Rather, we should consider this as part of the announcement/communication of this bot when we release it on wikis.

Namespaces?
What should the bot do with an image is used on a highly used template? Should be only the template talk page notified, or should be the notification placed at talk pages of all articles where the template is used? Or the WikiProjects (if any) that use the template?

Let's start small for now — for this first version the bot should only post for files used on content namespace pages. We should also look into how frequently files used on non-content namespaces are actually marked for deletion, and to what extent.

TBolliger updated the task description. (Show Details)Mar 20 2018, 11:26 PM

We discussed this a bit more in person and made some changes in the ticket description above. Please leave more feedback if you have it!

TBolliger updated the task description. (Show Details)Mar 20 2018, 11:35 PM

According to https://meta.wikimedia.org/wiki/Bot_policy#Global_bots, Community Tech bot would not be eligible for global bot status.

According to https://meta.wikimedia.org/wiki/Bot_policy#Global_bots, Community Tech bot would not be eligible for global bot status.

the bot must only maintain interlanguage links or fix double-redirects;

That is a very restrictive policy. :/

So it seems like the first steps will be to make this opt-in and prove that the bot is useful to those wikis before approaching the subject of T190234: Make User:Community Tech bot a global bot to avoid rate limiting for Commons Deletion Bot with... Meta users? Stewards?

@TBolliger, we'd need to get an approval for this action of the bot (posting about commons image deletions) on every wiki we plan to run the bot on. So we can start with a medium-sized wiki to experiment with and then expand it to bigger wikis when we've refined the bot. My understanding is that bots aren't rate-limited at all, so we don't need to worry about T190234.

@TBolliger, we'd need to get an approval for this action of the bot (posting about commons image deletions) on every wiki we plan to run the bot on. So we can start with a medium-sized wiki to experiment with and then expand it to bigger wikis when we've refined the bot. My understanding is that bots aren't rate-limited at all, so we don't need to worry about T190234.

So we should mark T190234 as declined?

@TBolliger, we'd need to get an approval for this action of the bot (posting about commons image deletions) on every wiki we plan to run the bot on. So we can start with a medium-sized wiki to experiment with and then expand it to bigger wikis when we've refined the bot. My understanding is that bots aren't rate-limited at all, so we don't need to worry about T190234.

So we should mark T190234 as declined?

Yes. Invalid, rather. :)

Nick added a subscriber: Nick.Mar 23 2018, 9:32 AM

What's the criteria for choosing which 10 (talk) pages will receive a notification ?

I would suggest page views, or failing that, most watched articles/talk pages.

kaldari added a comment.EditedMar 23 2018, 10:43 PM

It seems like "most page watchers" would make more sense. After all, we're interested in alerting editors, not readers. (Keep in mind that watchers is empty for articles that have less than $wgUnwatchedPageSecret active watchers, for example Earth vs. Zygoballus.)

TBolliger updated the task description. (Show Details)Apr 3 2018, 4:17 PM
TBolliger updated the task description. (Show Details)
TBolliger updated the task description. (Show Details)Apr 11 2018, 8:01 PM

I've updated the task to reflect a conversation with Kaldari, Danny, Niharika and myself. If we're picking 10 articles to post on if an image is frequently embedded, we should try to pick the 10 most likely to draw attention. We can use pages which are most Watchlisted.

@TBolliger: Unfortunately, it looks like retrieving most watchlisted isn't as easy as I was hoping. The number isn't recorded in the database, instead it has to be calculated on the fly for every article (by counting the number of entries in the watchlist table), which could be expensive if there are thousands of articles (since the watchlist table is rather huge). @MaxSem: Any idea how expensive this would actually be? Any other ideas for choosing which 10? Personally, I would be OK with just using the first 10 in the list, although that wouldn't be ideal.

First of all, this information is not available to the bot at all - watchlists are considered private information. Had it been, I would have likely figured something tolerably performing.

JJMC89 added a subscriber: JJMC89.Apr 12 2018, 10:24 PM

Watchers is available through the API when above the minimum watchers threshold.

Request
{
	"action": "query",
	"format": "json",
	"prop": "info",
	"titles": "Main Page|United States|Germany",
	"inprop": "watchers"
}
Response
{
    "batchcomplete": "",
    "query": {
        "pages": {
            "11867": {
                "pageid": 11867,
                "ns": 0,
                "title": "Germany",
                "contentmodel": "wikitext",
                "pagelanguage": "en",
                "pagelanguagehtmlcode": "en",
                "pagelanguagedir": "ltr",
                "touched": "2018-04-11T20:55:14Z",
                "lastrevid": 835364525,
                "length": 253056,
                "watchers": 1691
            },
            "15580374": {
                "pageid": 15580374,
                "ns": 0,
                "title": "Main Page",
                "contentmodel": "wikitext",
                "pagelanguage": "en",
                "pagelanguagehtmlcode": "en",
                "pagelanguagedir": "ltr",
                "touched": "2018-04-12T19:15:04Z",
                "lastrevid": 807996266,
                "length": 6029,
                "watchers": 111552
            },
            "3434750": {
                "pageid": 3434750,
                "ns": 0,
                "title": "United States",
                "contentmodel": "wikitext",
                "pagelanguage": "en",
                "pagelanguagehtmlcode": "en",
                "pagelanguagedir": "ltr",
                "touched": "2018-04-12T20:12:22Z",
                "lastrevid": 836110089,
                "length": 405559,
                "watchers": 3532
            }
        }
    }
}

*scratches head*

Didn't realise counts were available via the API, thanks. There's also T59617: Make watchlist table available as curated foo_p.watchlist_count on labsdb to expose the counts via replicas.

Well look at that! Does anybody know what the minimum watchers threshold is? I'm fine saying to pick the top 10 based on watchers, if available, otherwise pick the 10 based on whatever logic is simplest. (Most recently edited would also be likely to have active watchers.)

JJMC89 added a comment.EditedApr 13 2018, 12:12 AM

Well look at that! Does anybody know what the minimum watchers threshold is?

>30 based on the below.

$wgUnwatchedPageThreshold allows users without the unwatchedpages user right to view the number of page watchers for a specified page via the info action if the number of watchers is above the specified threshold.

InitialiseSettings.php
'wgUnwatchedPageThreshold' => [
	'default' => 30, // Default value of https://toolserver.org/~mzmcbride/watcher/
],

@MaxSem — some users are ready to test the bot on their wikis! Do you have an estimated date when the bot will be ready?

I think we wildly underestimated this ticket. It's been in development for over a month at this point. Given GlobalPreferences being a distraction, I would expect this ticket to last longer but not by *that* much.

Soon!

This isn't a helpful answer. Let's try to set a target completion date in sprint planning tomorrow.

MaxSem added a comment.May 7 2018, 6:06 PM

It's going to be able to do a run on an English-language wiki this week. We need to decide upon edit summaries. Currently, the bot doesn't add use the section adding functionality because I wanted to add multiple messages in one edit, which doesn't allow us to use section autosummaries. We need to either decide upon summaries or consider a page having multiple files on it nominated for deletion in the same edit period to be an occurrence rare enough to not worry about it.

It's going to be able to do a run on an English-language wiki this week. We need to decide upon edit summaries. Currently, the bot doesn't add use the section adding functionality because I wanted to add multiple messages in one edit, which doesn't allow us to use section autosummaries. We need to either decide upon summaries or consider a page having multiple files on it nominated for deletion in the same edit period to be an occurrence rare enough to not worry about it.

For edit summary, what about something along the lines of An image used on this article has been marked for deletion.

Let's keep the functionality to post multiple notices at once. I see value in that.

TBolliger updated the task description. (Show Details)May 8 2018, 9:34 PM

Removing date from post as it is already in the signature.

Cirdan added a subscriber: Cirdan.May 9 2018, 5:16 AM
MaxSem closed this task as Resolved.May 24 2018, 11:30 PM
MaxSem moved this task from In Development to Q1 2018-19 on the Community-Tech-Sprint board.

The bot is up and running, we can continue in different tasks.

Yay!

I wonder if there's any way we can avoid situations like: https://en.wikipedia.org/wiki/Talk:St._Mary%27s_Jacobite_Syrian_Church,_Marady (caused by https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Files_uploaded_by_Alwinks137). Maybe the bot should look for an existing header and post under that if possible.

Oof, that's no good. (Regardless of the problems that encyclopedia article has...)

@MaxSem — would you like me to create a new ticket to capture ideas on how to make this better? I expect this situation to happen somewhat frequently, given how images are patrolled and marked for deletion in batch.

Yes, this is something we would want to tackle. We could bundle them like this:

The following files used on this page are up for deletion:

  • [[File:Foo]]
  • [[File:Bar]]

Blah blah. ~~~~

One thing to remember though that we need to communicate nomination pages clearly in case they're different.