Page MenuHomePhabricator

Clean up orphaned echo_event rows again
Closed, ResolvedPublic

Description

It's been almost 3 years since we last did this (T136425: Remove orphaned echo_event rows), so they've built up again. When I looked on Friday, there were roughly ~60M orphaned rows on enwiki out of ~90M rows total.

@jcrespo said to run this later this week, after Wednesday's backups

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

^Something to keep an eye on, I asked to be documented on Deployments wiki page and suggested Wednesday, the day after the Monday-to-wednesday backup process.

Catrope moved this task from In progress to Triage on the DBA board.

I've set a reminder to kick this off at 17:00 UTC on Wednesday February 27.

Thanks for the heads up, that works for me!

I forgot to log the task number when I logged this:

2019-02-27 17:05 RoanKattouw: Running foreachwikiindblist dblists/echo.dblist extensions/Echo/maintenance/removeOrphanedEvents.php on mwmaint1002

This finished last night after I went to sleep

Please tell us from which set of servers, which tables you deleted rows from, as we agreed on IRC, so we can potentially defragment them. That action doesn't have to happen on this ticket, but please communicate that before closing it again (otherwise deleting rows has no meaningful impact on freeing resources).

Please tell us from which set of servers, which tables you deleted rows from, as we agreed on IRC, so we can potentially defragment them. That action doesn't have to happen on this ticket, but please communicate that before closing it again (otherwise deleting rows has no meaningful impact on freeing resources).

Ah yes, sorry for forgetting!

The script that I ran deleted rows from the echo_event tables in every database on x1.

Thanks!
I will create a ticket to get them defragmented

@Catrope One last think, not sure if you are in charge of that, and obviously not a huge priority, but maybe there should be some conversations of changing defaults on notifications, watchlists, etc. for places like wikidata combined with users that are bots with high editing activity, where lots of emails or notifications may never seen. Apologies if this has been brought up in the past or has already been handled. Apologies also if it has nothing to do with this purge and I am mixing things.

It's funny that you should mention Wikidata, because that wiki actually had a very very high cleanup rate (something like 274M/276M or ~99.5%, whereas 60-70% was typical). You're right that there's lots of activity on Wikidata causing more notifications etc, but because the number of notifications is limited to 2000 per user, they just turn over more quickly instead. What determines the number of surviving notifications is not the amount of bot activity, but the number of humans asking to be informed of that bot activity. And Wikidata is populated by a small number of hyper-informed human users, whereas wikis like English Wikipedia have a lot more human users who are at least somewhat active. At least that's my theory for why, after this purge, enwiki had ~34M event rows left and wikidatawiki ~1.5M.

Longer term, I want to change Echo so that it cleans up orphaned echo_event rows automatically (at the same time that the last corresponding echo_notification row is deleted). Then we wouldn't have to periodically run these large deletions.

Also if you want to talk more about this, feel free to ping me on IRC any time. It's a somewhat complicated topic and we probably both have an incomplete understanding of it, and I feel like I might not have understood exactly what you were getting at.