Page MenuHomePhabricator

Temporary index for Echo backfillReadBundles.php?
Closed, DeclinedPublic

Description

Now that T135699: Schema changes for Echo moderation is done, we need to run a migration script on some of the data in the echo_notification table. This script will probably take a long time to run, my back-of-the-envelope estimate for enwiki is about 100 hours to update 2.2M rows (based on inspecting 3.6M other rows). With a temporary index on notification_bundle_display_hash, we could probably make this a lot faster. Is adding a temporary index for this kind of thing recommended / a thing we do?

For context and more discussion about approaches, see https://gerrit.wikimedia.org/r/#/c/292551/1/maintenance/backfillReadBundles.php

Event Timeline

Is adding a temporary index for this kind of thing recommended / a thing we do?

Is something we can do indeed, but please combine it with the other cleanup things I mentioned before. Running a script that takes 100 hours is not a huge issue. Doing that 5 times consumes a lot of DBA time (which sadly there is not much), which has to control things work as intended.

So, please try to minimize the times you do maintenance by combining several operations in one. In particular, I think I mentioned some echo tables being over-indexed- I do not need you to do absolutely everything at once, but please draft a plan of needed maintenance first to minimize overhead.

I also want to have a look at your maintenance script, usually some practical issues regarding infrastructure are not taken into account.

(this is not yet a proper DBA request -it is in planning phase-, please create a proposal of several actions to do and re-add the tag when ready)

So, please try to minimize the times you do maintenance by combining several operations in one. In particular, I think I mentioned some echo tables being over-indexed- I do not need you to do absolutely everything at once, but please draft a plan of needed maintenance first to minimize overhead.

I'll line up all the things we talked about in the previous task, but unfortunately we can't fix the over-indexing until after this migration is complete. What I will do is identify which indexes can be removed. We could then combine adding the temporary index with the other maintenance, and combine removing the temporary index with the removal of the obsolete indexes.

I also want to have a look at your maintenance script, usually some practical issues regarding infrastructure are not taken into account.

The current version of the script is at https://phabricator.wikimedia.org/diffusion/ECHO/browse/master/maintenance/backfillReadBundles.php . There's also https://gerrit.wikimedia.org/r/#/c/292551/1/maintenance/backfillReadBundles.php where we're discussing possible changes to the script. If/when we have a temporary index, I think the current version would work, if not we'll need some changes so that the slowness is concentrated in SELECT queries and the UPDATE queries are fast.

I'll line up all the things we talked about in the previous task, but unfortunately we can't fix the over-indexing until after this migration is complete. What I will do is identify which indexes can be removed. We could then combine adding the temporary index with the other maintenance, and combine removing the temporary index with the removal of the obsolete indexes.

+1. At least let's have a clear roadmap and minimize the steps.

The current version of the script

I will give it a look, then get back to here with suggestions if needed.

The current version of the script

I will give it a look, then get back to here with suggestions if needed.

FYI, we've updated the script, see https://phabricator.wikimedia.org/diffusion/ECHO/browse/master/maintenance/backfillReadBundles.php for the current version and https://gerrit.wikimedia.org/r/293100 for the changes. With these changes we think we no longer need a temporary index.