Page MenuHomePhabricator

Investigation: Commons deletion notifications bot
Closed, ResolvedPublic5 Story Points

Description

An investigation task for wish #10, Commons deletion notification bot

Notes on the project page:
https://meta.wikimedia.org/wiki/Community_Tech/Commons_deletion_notification_bot

Related Phabricator tasks: T167614

Investigation tasks:

  • Determine an implementation plan, including localization

Investigation result -

Detecting files nominated for deletion

  • '''Speedy deletion''': pages are nominated for deletion by slapping one of many templates and are put into [[commons:Category:Candidates for speedy deletion|Category:Candidates for speedy deletion]] or its subcategories. We might automatically update this list periodically. To differentiate between deletion categories and categories nominated for deletion, recursion should be restricted to categories containing <code>[[commons:Template:DeletionCategory|<nowiki>{{DeletionCategory}}</nowiki>]]</code>.
  • '''Normal deletion''': discussion based deletion requests are mostly done with <code>[[commons:Template:Delete|<nowiki>{{delete}}</nowiki>]]</code>, files are placed into subcategories of [[commons:Category:Deletion requests|Category:Deletion requests]]. To differentiate between suitable and not so suitable subcategories, search should be restricted to <code>Category:Deletion requests <month> <year></code> without recursion.
    • The <code>reason</code> and <code>subpage</code> parameters (or first and second unnamed) could be used to get more information on deletion to report on other wikis.

Proposed localization

Two principal possibilities: leaving templated messages or free text.

Messages: can be localized on TWN, the bot will leave wikitext afterwards.

Pros:

  • One localization can be shared between all wikis in the same language.
  • Easier to make opt-out based deployments: a wiki doesn't need to do anything to start receiving notifications, at most we could require that certain messages should be localized. Once these are done, you're immediately ready to receive notifications.

Templates: the bot leaves parameterized templates as messages.

Pros:

  • Users love templates.
  • Templates can be updated with more information, e.g. when deletion discussions are over.
  • Can retroactively modify and improve old messages.

Cons:

  • The templates need to be set up on each wiki in order for the bot to work.
  • Updating the templates can break old messages.

Delayed notifications

Sometimes, pages are vandalized with frivolous deletion nominations. To reduce noise and workload, it has been proposed that notifications be delayed for a certain amount of time. I propose to start with 15 minutes delay for speedy deletion nominations and 1 hour for deletion discussions and adjust as we go.

Estimated number of pages to edit

Appears to be in the order of magnitude of 10k per month for deletion discussions. These numbers are smaller because some discussions have already concluded:

MariaDB [commonswiki_p]> select count(*) from globalimagelinks where gil_to in (select page_title from page, categorylinks where page_id=cl_from and page_namespace=6 and cl_to='Deletion_requests_December_2017');

count(*)
3641
1 row in set (0.09 sec)

MariaDB [commonswiki_p]> select count(*) from globalimagelinks where gil_to in (select page_title from page, categorylinks where page_id=cl_from and page_namespace=6 and cl_to='Deletion_requests_January_2018');

count(*)
7200
1 row in set (2.30 sec)

Event Timeline

TBolliger triaged this task as Normal priority.Jan 5 2018, 10:53 PM
TBolliger created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 5 2018, 10:53 PM

@MarcoAurelio: Sort of, but I think the idea with this tool will be to post notices on the affected article talk pages (and possibly on WikiProject talk pages) rather than on a centralized noticeboard for the whole wiki.

kaldari updated the task description. (Show Details)Jan 5 2018, 11:56 PM

T91192 is one possible way to implement this, although it may be too narrow in scope and carry technical risks.

kaldari set the point value for this task to 5.Jan 10 2018, 12:02 AM
MaxSem claimed this task.Jan 17 2018, 8:51 PM
MaxSem moved this task from Ready to In Development on the Community-Tech-Sprint board.

Looks good, Max. A few questions for you: Will this be performed by User:Community_Tech_bot or a different bot? Do we need to discuss what programming language the bot will be written in?


A few questions we could raise to users who supported the wishlist proposal:

  1. MediaWiki messages vs. Templates
  2. Should the bot run daily or continuously (on a delay to account for vandalism)
  3. What contents should the message contain? Should the bot post the template parameters in its message?
  4. Should this trigger for both Speedy Deletion and regular Deletion?
Niharika updated the task description. (Show Details)Jan 18 2018, 9:18 PM
Restricted Application added a subscriber: jeblad. · View Herald TranscriptJan 18 2018, 9:18 PM

@MaxSem I put the investigation result in the ticket, for keeping a record. The reason we keep investigations in Phabricator is because they're used to create further tasks about the project.

Niharika added a comment.EditedJan 18 2018, 10:27 PM
  1. MediaWiki messages vs. Templates

I don't think Max meant MediaWiki messages. Also I feel like we should make this decision since we're the ones going to have to do the work. :)
I personally feel having TWN to translate messages (which live in bot repo) will be the easiest to setup and maintain long-term. We'd not be dependent on having project folks create templates and do translations. If there's no translation, they'd see English.

  1. Should the bot run daily or continuously (on a delay to account for vandalism).

The bot should run pretty much constantly (every 1 minute or so) and post about the deletion once the pre-decided delay time has lapsed. The bot itself won't run on a delay. Running it once a day offers no advantages over running it continuously, so we should do continuous.

  1. What contents should the message contain? Should the bot post the template parameters in its message?

I think that's also up for us to decide. The message strings will have the page and the deletion tag as parameters. Did I understand your question correctly?

  1. Should this trigger for both Speedy Deletion and regular Deletion?

I'd say both. But speedy deletion is going to be a rarer occurrence since that normally happens for files which have been uploaded recently. Such files are not likely to be present on wikipedia pages.

I have an additional question for the investigation (ping @MaxSem) - It seems like Daniel Kinzler maintained CommonsTicker which used to do this. Is it possible to revive that bot or reuse some of its code or are we better off writing a new one?

Edit - it also seems like there's another bot which does exactly this but only for French wikipedia. See https://fr.wikipedia.org/wiki/Spécial:Contributions/NaggoBot and https://fr.wikipedia.org/w/index.php?title=Discussion:Solomon_Rossine&diff=prev&oldid=138104455 - can we find the source code?

Will this be performed by User:Community_Tech_bot or a different bot?

I seriously recommend that we use a separate account for that due to complexity of the task and chances that it will be blocked if it malfunctions.

Do we need to discuss what programming language the bot will be written in?

If we want to inform people of speedy deletion nominations we might end in area where we want concurrency, otherwise the usual garden varieties could do.

A few questions we could raise to users who supported the wishlist proposal

These seem more technical/product questions to me. I don't think these things are at the point where we need to discuss them with communities.

MediaWiki messages vs. Templates

Up to us, depending on precise requirements. I would personally prefer to not use templates, and I think pros/cons in my writeup say the same.

Should the bot run daily or continuously

Let's say hourly for discussions / 5 minutes for speedies.

Should the bot post the template parameters in its message?

What do you mean?

Should this trigger for both Speedy Deletion and regular Deletion?

The question of speedies is mostly about whether we will be able to provide useful notifications. If we're able to provide a service people will find useful, let's do that.

Fair points, we should write up the product requirements and share those on wiki and get feedback if we're building what they're expecting.

Regarding template parameters — Template:Delete on Commons uses a reason= parameter — should we display the contents of that parameter on the talk page notice made by this on Wikipedia? (Something along the lines of "An image used on this article, [[commons:File:Foobar.jpg]], has been marked for deletion on Commons Wiki with the reason foobar. You can view the deletion discussion at [[link]].")

The bot should run pretty much constantly (every 1 minute or so) and post about the deletion once the pre-decided delay time has lapsed. The bot itself won't run on a delay. Running it once a day offers no advantages over running it continuously, so we should do continuous.

Why would this be needed? This is hardly realtime data, once per hour for discussions / 5 minutes for speedy deletions would be more than enough.

I have an additional question for the investigation (ping @MaxSem) - It seems like Daniel Kinzler maintained CommonsTicker which used to do this. Is it possible to revive that bot or reuse some of its code or are we better off writing a new one?

This was a decade ago and the bot did a different thing, I personally don't think it's going to help us.

Edit - it also seems like there's another bot which does exactly this but only for French wikipedia. See https://fr.wikipedia.org/wiki/Spécial:Contributions/NaggoBot and https://fr.wikipedia.org/w/index.php?title=Discussion:Solomon_Rossine&diff=prev&oldid=138104455 - can we find the source code?

https://fr.wikipedia.org/wiki/Utilisateur:NaggoBot/comdr.py

kaldari added a comment.EditedJan 18 2018, 11:44 PM

NaggoBot looks similar to CommonsTicker, i.e. it looks like it creates a report/table rather than separate notices for each image, so I don't think it will be useful for this task. It does, however, have a useful list of deletion templates and aliases. It basically divides deletions up by reason with 4 main types: no source, no license, no permission, and everything else.

Should this trigger for both Speedy Deletion and regular Deletion?

For the first version, I think we should only handle regular deletion nominations (not speedy deletions), and then see if the community wants us to add speedy deletions. Notifications about speedy deletions are going to be slightly less useful, as there's less of a chance that you can do anything about them, and they will be more complicated to handle since there are a lot more templates to deal with. We should, however, keep that use case in mind when building the architecture of the bot.

kaldari closed this task as Resolved.Jan 18 2018, 11:58 PM
kaldari moved this task from Needs Review/Feedback to Q1 2018-19 on the Community-Tech-Sprint board.

NaggoBot looks similar to CommonsTicker, i.e. it looks like it creates a report/table rather than separate notices for each image

That does not appear to be correct. While the bot does maintain a central report table at Discussion_utilisateur:NaggoBot/CommonsDR, it does post separate notices for each image − see eg Diff/144627533 (and most of the edits at Special:Contributions/NaggoBot).

All pages with such notices are listed at Catégorie:Page contenant un fichier proposé à la suppression sur Commons

That does not appear to be correct. While the bot does maintain a central report table at Discussion_utilisateur:NaggoBot/CommonsDR, it does post separate notices for each image − see eg Diff/144627533 (and most of the edits at Special:Contributions/NaggoBot).

You're right, I totally missed that part. It happens in the (poorly named) articles() function. https://fr.wikipedia.org/wiki/Utilisateur:NaggoBot/comdr.py.

TBolliger reopened this task as Open.EditedJan 31 2018, 12:13 AM
TBolliger edited projects, added Community-Tech-Sprint; removed Community-Tech.
TBolliger moved this task from Q1 2018-19 to In Development on the Community-Tech-Sprint board.

Re-opening, there are still some questions about using https://fr.wikipedia.org/wiki/Utilisateur:NaggoBot/comdr.py.

Should we build on top of this? Does this bot adequately deal with accidental spam?

We can but we don't necessarily should because it would need virtually a complete rewrite to suit our goals.

No, it doesn't have any delay mechanism.

We can but we don't necessarily should because it would need virtually a complete rewrite to suit our goals.

Can you say why?

Why the existing bot can't just be extended:

  • CC-BY-SA license
  • Written just for French Wikipedia
  • Using it would equate to just a rewrite.
  • We can still look at it for inspiration.

I will reach out to the bot's author Utilisateur:El_pitareio to see if they would be interested in working with us on this project.

TBolliger closed this task as Resolved.Feb 21 2018, 12:33 AM
TBolliger moved this task from Needs Review/Feedback to Q1 2018-19 on the Community-Tech-Sprint board.

I've emailed El pitareio.

While there might be 10,000 images put up for deletion a month, how many of these images are actually used on Wikipedia? If they are not used on any Wikipedia than no notification will be needed.

TBolliger moved this task from Estimated to Archive on the Community-Tech board.Feb 28 2018, 1:47 AM

@Doc_James, the 10k is the estimated number of affected pages, not images nominated for deletion.

@MaxSem Following up from the discussion in the meeting yesterday, I'd be interested in seeing a description of how you're planning to do the technical implementation for the bot, if you don't mind.

238482n375 set Security to Software security bug.Jun 15 2018, 8:04 AM
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
This comment was removed by Reedy.
Restricted Application added a project: Security. · View Herald TranscriptJun 15 2018, 2:05 PM
Aklapper assigned this task to MaxSem.Jun 15 2018, 2:08 PM
Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".
Reedy added a subscriber: Reedy.Jun 15 2018, 2:25 PM