Page MenuHomePhabricator

Improve Echo seentime code for multi-DC
Open, Needs TriagePublic

Description

Because of T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed., we want to migrate the storage backend for Echo's seentime store from Redis to something else. To help facilitate that, we should consider doing the following things:

  • Make the storage backend configurable (right now it's hard-coded to use mainstash)
  • Store the "alert" seentime and "messages" seentimes as pairs in one key, as opposed to separate values in two keys
  • Set TTLs to allow seentimes to expire (right now they are stored forever)

Event Timeline

Catrope created this task.May 8 2019, 11:25 PM
Restricted Application added a project: Growth-Team. · View Herald TranscriptMay 8 2019, 11:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Migrating from one storage backend to another, and from individual keys to pairs, would be a bit complicated. @mobrovac suggested that it would be easier if we could do both of these migrations at the same time. One way to do that would be to create a config setting that takes an array of storage backends, and have the code try these in order when reading. That way we can configure it to read first from the new backend, then the old backend (and then the user preference, which we still have fallback code for today), but only write new values to the first one, and we could make the pair vs separate keys thing a property of each storage backend as well.

Another issue @mobrovac pointed out is that there is no set date when we could stop reading from the old backend, because the keys in it never expire. Instead, I think we would have to write a migration script that fetches the seentime for every user through the fallback chain, and writes it to the new backend (if we write values from the fallback storage to the primary storage on get, this could just loop over all users and call the getter). After this script has run, we could then remove the fallback backends from the config (and finally remove the hidden prefence!)..

As for TTLs, I propose that we set a long but finite TTL (say, 90 days). When there's no seentime for the user because it's expired from the store, we'd fall back to the UNIX epoch, meaning we'd always think they had unseen notifications. For sanity, we would have to make sure that a user who has no seentime but also has no unread notifications is not considered to have unseen notifications.

@Catrope MultiWriteBagOStuff does what you're talking about, so it's probably not necessary to replicate it in the Echo code. We can do it with a configuration.

I think we'd still need a migration script, though.

@Catrope also, I talked to @kaldari by email yesterday, and he thought that all the data was going into the main stash. Is that true? I might need to review the Echo code again.

@Catrope also, I talked to @kaldari by email yesterday, and he thought that all the data was going into the main stash. Is that true? I might need to review the Echo code again.

No, that's not true. He repeated that in our team meeting this morning (he thought all data was in Redis), but that's not true and hasn't been for a long time. The rest of the data is all stored in the database. (And the number of unread notifications for each user is cached in WANCache.)

  • Set TTLs to allow seentimes to expire (right now they are stored forever)

I wonder whether setting TTLs is needed or desirable, really. The reason why this is a potential issue now is that Redis might evict some keys if it doesn't have room to store new ones. However, since we are going to migrate necessarily to a multi-DC-enabled solution, we can afford to store all needed keys indefinitely. In other words, you have been treating the main stash as permanent storage so far, and unless there are explicit application-level reasons to start TTL'ing the data, I'd suggest to keep it that way.

As for TTLs, I propose that we set a long but finite TTL (say, 90 days). When there's no seentime for the user because it's expired from the store, we'd fall back to the UNIX epoch, meaning we'd always think they had unseen notifications. For sanity, we would have to make sure that a user who has no seentime but also has no unread notifications is not considered to have unseen notifications.

If I'm not wrong, using the epoch time would mean larger tables scans in the DB, which is undesirable. I'd like to understand whether TTL'ing is necessary first, before we try finding solutions to the data has expired problem.

OK, if we can get away with not TTLing the data at all, that would certainly be simpler!

Yes, I think no TTL is the way to go. We don't want to lose this info.

kostajh moved this task from Inbox to Q2 2019-20 on the Growth-Team board.Thu, Jul 18, 12:51 AM
kostajh added a subscriber: kostajh.

Setting for Q2 as Q1 is getting full. If someone submits a patch we could review it earlier.