Page MenuHomePhabricator

Figure out which bots mention users in summaries, how often, and decide how 'Edit summary pings' should handle bots
Closed, ResolvedPublic

Event Timeline

Niharika triaged this task as Medium priority.Mar 5 2018, 11:34 PM
Niharika created this task.
TBolliger renamed this task from Figure out which bots mention users in summaries and how often to Figure out which bots mention users in summaries, how often, and decide how 'Edit summary pings' should handle bots.Mar 5 2018, 11:46 PM

Some preliminary data from enwiki -
(Note that the data is from recentchanges and not revision because querying that would take longer than the age of the universe)

wikiadmin@db1080(enwiki)>select count(*) from recentchanges where rc_comment LIKE '%[[User:%';
+----------+
| count(*) |
+----------+
|   346856 |
+----------+
1 row in set (3.51 sec)

wikiadmin@db1080(enwiki)>select count(*) from recentchanges where rc_comment LIKE '%[[User:%' and rc_bot = 1;
+----------+
| count(*) |
+----------+
|   280132 |
+----------+
1 row in set (4.75 sec)

Bots -

wikiadmin@db1080(enwiki)>select rc_user_text, count(*) from recentchanges where rc_comment LIKE '%[[User:%' and rc_bot = 1 group by rc_user_text;
+------------------------+----------+
| rc_user_text           | count(*) |
+------------------------+----------+
| Amalthea (bot)         |     2076 |
| AnomieBOT              |    11021 |
| AnomieBOT III          |      510 |
| AvicBot                |       66 |
| Bot1058                |      646 |
| ClueBot III            |        6 |
| Cyberbot I             |     1502 |
| DPL bot                |     2747 |
| DYKUpdateBot           |      289 |
| DatBot                 |     3851 |
| DeltaQuadBot           |     9070 |
| DumbBOT                |      306 |
| EarwigBot              |      756 |
| Fluxbot                |        5 |
| FrescoBot              |     8050 |
| GreenC bot             |     9518 |
| HBC AIV helperbot5     |     5624 |
| HasteurBot             |     4466 |
| InternetArchiveBot     |    13478 |
| IznoBot                |    18505 |
| JCW-CleanerBot         |     1159 |
| Jasper Deng            |        2 |
| KolbertBot             |    64929 |
| Lowercase sigmabot III |       68 |
| Mathbot                |      216 |
| MusikBot               |       87 |
| NationalRegisterBot    |       63 |
| PositionStatements Bot |      152 |
| PrimeBOT               |     1584 |
| QuickStatementsBot     |   110921 |
| RonBot                 |     3262 |
| RussBot                |      198 |
| SineBot                |     4324 |
| SuccuBot               |      169 |
| Theo's Little Bot      |      441 |
| Xqbot                  |       23 |
+------------------------+----------+
36 rows in set (3.24 sec)

If it means anything, many of these bots are in most cases only pinging themselves, including Amalthea (bot), AnomieBOT, AnomieBOT III, Bot1058, DatBot, DumbBOT, and MusikBot.

Based on this data, my recommendation would be that we avoid pings when the edit is made by a bot (i.e. rc_bot = 1). We cannot ask everyone to fix bots everywhere. Tomorrow, someone can make a new bot and start spamming people mistakenly. Plus, this data is only for enwiki. There can be hundreds of more bots doing this across our projects.

Note that we will not catch all bots. Sometimes bots operate without the bot flag and those will not be stopped by this change.

Perhaps stop by default, but allow an override via a new api parameter - this would have to be purposefully asserted and would not currently exist.

Note, some of these are not editing enwiki at all - for example:

https://en.wikipedia.org/w/index.php?title=Special%3ACentralAuth&target=QuickStatementsBot

Only edits wikidata.

Note that we will not catch all bots. Sometimes bots operate without the bot flag and those will not be stopped by this change.

I found some of these -

wikiadmin@db1080(enwiki)>SELECT DISTINCT rc_user_text, COUNT(*) FROM recentchanges WHERE rc_comment LIKE '%[[User:%' AND rc_bot != 1 AND (rc_user_text LIKE '%Bot%' OR rc_user_text LIKE '%bot%') GROUP BY rc_user_text ;
+------------------+----------+
| rc_user_text     | COUNT(*) |
+------------------+----------+
| AAlertBot        |        2 |
| MediationBot     |       12 |
| MusikBot         |        2 |
| NinjaRobotPirate |      306 | <---- false positive
| RonBot           |     3564 |
| Saboteurest      |        2 | <---- false positive
| SportsStatsBot   |       12 |
+------------------+----------+
7 rows in set (25.18 sec)

We can talk to RonBot owner and ask them to edit with the bot flag.

not sure on casing on that table, may want to also to a:

lower(rc_comment) LIKE '%[[user:%'

or the like

We can talk to RonBot owner and ask them to edit with the bot flag.

What is rc_type for those? I'm guessing it's log entries (https://en.wikipedia.org/wiki/Special:Log/RonBot) which can't be marked as via a bot IIRC. But is the plan to search log entries for pings, I thought it was just edits?

Perhaps stop by default, but allow an override via a new api parameter - this would have to be purposefully asserted and would not currently exist.

It's a good idea. Currently, when I archive a discussion block, my bot needs a second edit to notify a user. With that, I could then do it in one edit.

I think bot pings fall outside the initial scope of this project so I support suppressing mentions from accounts with the bot usergroup. I'll also file a ticket for @Xaosflux's idea above for potential future prioritization.

Without self-links and Wikidata changes that will not trigger notifications anyway:

mysql:research@analytics-store.eqiad.wmnet [enwiki]> select rc_user_text, count(*) num from recentchanges where rc_comment LIKE '%[[User:%' and rc_comment not like concat('%[[User:', rc_user_text, '%') and rc_bot = 1 and rc_source <> 'wb' group by rc_user_text order by num desc;
+------------------------+-------+
| rc_user_text           | num   |
+------------------------+-------+
| InternetArchiveBot     | 13351 |
| GreenC bot             |  8751 |
| HBC AIV helperbot5     |  5711 |
| SineBot                |  4346 |
| DYKUpdateBot           |   289 |
| Lowercase sigmabot III |    68 |
| AvicBot                |    66 |
| NationalRegisterBot    |    63 |
| Xqbot                  |    23 |
| AnomieBOT              |     8 |
| ClueBot III            |     6 |
| Fluxbot                |     2 |
| AnomieBOT III          |     1 |
+------------------------+-------+
13 rows in set (3.34 sec)

We can talk to RonBot owner and ask them to edit with the bot flag.

What is rc_type for those? I'm guessing it's log entries (https://en.wikipedia.org/wiki/Special:Log/RonBot) which can't be marked as via a bot IIRC. But is the plan to search log entries for pings, I thought it was just edits?

Good point about rc_type, I missed that completely. The rc_type on those is 0 but it's a false positive because...

wikiadmin@db1080(enwiki)>SELECT DISTINCT rc_comment FROM recentchanges where rc_comment LIKE '%[[User:%' AND rc_type = 0 and rc_user_text = 'RonBot' LIMIT 10\G;
*************************** 1. row ***************************
rc_comment: Orphaned non-free file(s) deleted per [[WP:F5|F5]] ([[User:RonBot/Run|disable]])
*************************** 2. row ***************************
rc_comment: Requesting manual review ([[User:RonBot/Run|disable]])
2 rows in set (0.04 sec)

And yes, it's just for edits. Not log entries.

Just a( (late) idea: Maybe we could just suppress summary notifications on minor edits? This would be similar to how nominornewtalk works.

Just a( (late) idea: Maybe we could just suppress summary notifications on minor edits? This would be similar to how nominornewtalk works.

Is it the case that people would not like to ping other people for minor edits they've made? This seems like two mutually exclusive things and I'd not favor adding a dependency unless we're absolutely sure.

Just a( (late) idea: Maybe we could just suppress summary notifications on minor edits? This would be similar to how nominornewtalk works.

Is it the case that people would not like to ping other people for minor edits they've made? This seems like two mutually exclusive things and I'd not favor adding a dependency unless we're absolutely sure.

This would just apply to users with nominornewtalkpermission of course, which are just bots by default.