Page MenuHomePhabricator

Reduce DB space used by Echo notifications
Open, MediumPublic

Description

Echo notifications can take up significant DB space. On Wikidata, a particularly bot-heavy wiki with lots of edits, discarding notification sent to bots would probably free up considerable DB space (tens of gigabytes).

A similar thing was done for watchlist entries of Wikidata bot accounts in the past: T258098: Purge unused watchlist rows

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Ladsgroup is there particular urgency around this? When would something need to be done? (Also should this have some form of DBA tag on the task?)

@Ladsgroup is there particular urgency around this? When would something need to be done? (Also should this have some form of DBA tag on the task?)

x1 is in a rather healthy state so I no rush but better to be done before turning on T307902 in enwiki. That being said, it's a fun fix with an extremely heavy low-hanging fruit. Just don't run large write queries on master ;)

(Add the tag for consultation).

This problem has two sides:

  • Avoid creating new echo notification for bots. It would be better someone with good knowledge of echo do it.
  • Clean up the existing ones. I probably can take care of that.

We also have another issue with echo notifications db. It's orphan echo_event rows (just a mere 300M of them). Is it fine if I delete them in wikidatawiki?

367M of them:

root@db2115.codfw.wmnet[wikidatawiki]> select count(*) from echo_event left join echo_notification on event_id = notification_event where notification_user is null limit 50;
+-----------+
| count(*)  |
+-----------+
| 367801858 |
+-----------+
1 row in set (8 min 56.654 sec)

These can be cleaned up by running maintenance/removeOrphanedEvents.php. This was done in 2016 (T136425) and 2019 (T217073) but probably hasn't been done since.

I also filed some tasks for mitigating this by cleaning up these rows proactively. T221262 and T221261 have been implemented, but T221258 has not been. It's possible that the latter is the source of all these orphaned rows, or that the first two aren't working correctly, or that something changed (either in Echo or in code that uses Echo, or just increased traffic/activity) to make the circumstances that create these rows occur more frequently. 367M seems like a very high number to accumulate in only three years, we previously saw more like 60-90M on enwiki over that amount of time. But Wikidata is different in many ways, so maybe it just does more things that result in unnecessary event rows being created.

Mentioned in SAL (#wikimedia-operations) [2022-05-13T07:18:27Z] <Amir1> start of mwscript extensions/Echo/maintenance/removeOrphanedEvents.php --wiki=wikidatawiki --force (T308084)

It is basically impossible to run this on wikidata now :/ It is so slow that I'm worried it might bring down everything. The query itself doesn't seem too bad. Let me investigate.

sigh. So the original query is this:

SELECT  event_id  FROM `echo_event` LEFT JOIN `echo_notification` ON ((notification_event=event_id)) LEFT JOIN `echo_email_batch` ON ((eeb_event_id=event_id))   WHERE notification_user IS NULL AND eeb_user_id IS NULL  ORDER BY event_id ASC LIMIT 500;

And it's extremely slow even when you force PRIMARY index on echo_event. The only way to make it work is to add an upper limit condition on event_id. Up to event_id being 360M there is not much left, but after that, it'll get really slow, so I had to manually edit the script file, add event_id below < 362M, let it run and delete 800K rows, then change the condition to event_id below 363M and again. I need to do this for around 400 times (It' be less if I make it jump even two or three million event_id) but yeah, fun stuff. I haven't even looked at other wikis.

I think we should stop adding notification from bots soon, this is growing really rapidly to the point we can't easily clean it up.

Started doing another round of clean up. It'll take a couple of days.

So far cleaned 50M rows:

mysql:research@x1-analytics-replica.eqiad.wmnet [wikidatawiki]> select count(*) from echo_event left join echo_notification on event_id = notification_event where notification_user is null limit 50;
+-----------+
| count(*)  |
+-----------+
| 323307873 |
+-----------+
1 row in set (19 min 20.872 sec)

I agree that it's a problem that bots get notifications that are never checked, but I don't think the answer is to throw them away. Bot operators ought to be reviewing their notifications. If a bot is continually reverted, the operator should know about that as a matter of urgency. Editors rely on the fact that reverting another user's change will alert them to a problem, and bots should not be an exception to this.

What we should do instead is make it easier for bot operators to check on bot notifications.

I understand. I agree that some notification should probably stay but I don't think 100M notifications a year is something any bot operator could keep track of it and it probably even drowns the important ones. I guess it's mostly created pages getting connected that's flooding notifications.

So far cleaned 50M rows:

mysql:research@x1-analytics-replica.eqiad.wmnet [wikidatawiki]> select count(*) from echo_event left join echo_notification on event_id = notification_event where notification_user is null limit 50;
+-----------+
| count(*)  |
+-----------+
| 323307873 |
+-----------+
1 row in set (19 min 20.872 sec)

I have been cleaning those up since creation of this ticket. Now "only" 100M rows left:

mysql:research@x1-analytics-replica.eqiad.wmnet [wikidatawiki]> select count(*) from echo_event left join echo_notification on event_id = notification_event where notification_user is null limit 50;
+-----------+
| count(*)  |
+-----------+
| 101137022 |
+-----------+
1 row in set (5 min 42.517 sec)

It seems there is no objection on stopping the notification of bots for case article link (article-linked notification type aka the notification you get when an article you created gets a page link).

Also T221258: Avoid inserting echo_event rows when not needed needs implementation as well.

It seems there is no objection on stopping the notification of bots for case article link (article-linked notification type aka the notification you get when an article you created gets a page link).

Split out into T318523: Don't send article-linked notifications for bots

@Tgr moving this to Triaged, but if there's more you'd like to discuss, please move it back to the Needs Discussion column.

@DMburugu I think this task is an epic; specific work should be filed as subtasks. As such we should either move this to "Epics in progress" of current sprint or remove from current sprint and mark as triaged.

@kostajh Since it's being done in this Months Maintenance I'll move it to Epics in Progress under current sprint.

KStoller-WMF subscribed.

I'm trying to limit the number of open epics in Growth's current sprint, so I'm updating this task accordingly. But this work can continue as needed and is represented in our current sprint as part of T340453: [Epic] FY 2023-24 Growth Maintenance Work.

Mentioned in SAL (#wikimedia-operations) [2024-06-20T10:41:01Z] <Amir1> running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)