Page MenuHomePhabricator

Replicate Echo tables to analytics-store
Closed, ResolvedPublic

Description

I'm trying to gather some stats on the use of Echo notifications across wikis (T113626, T113664), and I'd like to join the echo_events table with the user table for a given wiki.

I can get echo_events on x1-analytics-slave, but not on analytics-store; I can get user on analytics-store, but not on x1-analytics-slave. This makes it impossible to join the two.

Can we get the Echo databases replicated to analytics-store? Thanks!

Event Timeline

nshahquinn-wmf raised the priority of this task from to Needs Triage.
nshahquinn-wmf updated the task description. (Show Details)

This is similar to T75047: Replicate flowdb from X1 to analytics-store. Last word on that was in August—Ops said it should wait on some rearchitecting they're doing.

nshahquinn-wmf renamed this task from Replicate Echo databases to analytics-store to Replicate Echo tables to analytics-store.Oct 13 2015, 6:52 PM
nshahquinn-wmf set Security to None.
In T115275#1722695, @Neil_P._Quinn_WMF wrote:

This is similar to T75047: Replicate flowdb from X1 to analytics-store. Last word on that was in August—Ops said it should wait on some rearchitecting they're doing.

The reachitecture is still needed- plus monitoring so there is not again a new outage. However, I do not have the resources not hardware nor human to do it any time soon.

I will put this in the short term backlog- although this is slightly more complex than just the flowdb.

Let's give some time to check how that replication works, and then I will setup this.

jcrespo triaged this task as Medium priority.

So here it is the thing:

Replicating just flowdb takes ~2 QPS and very few MBs. This has worked well for T75047.

Replicating the echo tables requires >120 GB and double the write QPS of enwiki- I will test replication on a codfw slave and see how it works.

I do not think this is possible right now. I would like you to request a different approach- if you want echo on analytics-store, something else has to go (like a core production shard or eventlogging). Otherwise, the hardware will not support it (it will be unable to be kept updated and will lag forever).

We are already having issues with eventlogging being lagged forever on dbstore2002, even with no user activity.

As an alternative, it would be easier to provide you a subset of the tables, or CONNECT tables that are virtual tables (so joins would be slow).

@jcrespo, it looks like I could accomplish most of the same things if the centralauth database was replicated to x1-analytics-slave. Maybe we should just do that?

That is probably more reasonable, let me a couple of days to research sizes, load (and if there is some security issue).

May I ask you to add the minimum columns that you need (just ids and usernames)? The less data is sent, the more likely it is that I can provide it to you.

@jcrespo, good question, because now that I've looked at the contents of the database, I realize it won't actually do what I want. The echo databases use only local user IDs, and the central auth database uses only usernames.

So let me go back to thinking about other possibilities, and how important this request actually is.

jcrespo changed the task status from Open to Stalled.Feb 3 2016, 9:44 AM
nshahquinn-wmf raised the priority of this task from Medium to Needs Triage.Mar 30 2018, 10:31 AM
nshahquinn-wmf moved this task from Backlog to Radar on the Contributors-Analysis board.