Page MenuHomePhabricator

Replicate Echo tables to analytics-store
Closed, ResolvedPublic

Description

I'm trying to gather some stats on the use of Echo notifications across wikis (T113626, T113664), and I'd like to join the echo_events table with the user table for a given wiki.

I can get echo_events on x1-analytics-slave, but not on analytics-store; I can get user on analytics-store, but not on x1-analytics-slave. This makes it impossible to join the two.

Can we get the Echo databases replicated to analytics-store? Thanks!

Event Timeline

Neil_P._Quinn_WMF raised the priority of this task from to Needs Triage.
Neil_P._Quinn_WMF updated the task description. (Show Details)
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 12 2015, 6:24 PM
Krenair added a subscriber: Krenair.

This is similar to T75047: Replicate flowdb from X1 to analytics-store. Last word on that was in August—Ops said it should wait on some rearchitecting they're doing.

Neil_P._Quinn_WMF renamed this task from Replicate Echo databases to analytics-store to Replicate Echo tables to analytics-store.Oct 13 2015, 6:52 PM
Neil_P._Quinn_WMF set Security to None.
jcrespo moved this task from Triage to Backlog on the DBA board.Oct 29 2015, 3:20 PM

This is similar to T75047: Replicate flowdb from X1 to analytics-store. Last word on that was in August—Ops said it should wait on some rearchitecting they're doing.

The reachitecture is still needed- plus monitoring so there is not again a new outage. However, I do not have the resources not hardware nor human to do it any time soon.

I will put this in the short term backlog- although this is slightly more complex than just the flowdb.

Let's give some time to check how that replication works, and then I will setup this.

jcrespo claimed this task.Oct 29 2015, 3:23 PM
jcrespo triaged this task as Normal priority.

Thanks for the update, @jcrespo!

So here it is the thing:

Replicating just flowdb takes ~2 QPS and very few MBs. This has worked well for T75047.

Replicating the echo tables requires >120 GB and double the write QPS of enwiki- I will test replication on a codfw slave and see how it works.

I do not think this is possible right now. I would like you to request a different approach- if you want echo on analytics-store, something else has to go (like a core production shard or eventlogging). Otherwise, the hardware will not support it (it will be unable to be kept updated and will lag forever).

We are already having issues with eventlogging being lagged forever on dbstore2002, even with no user activity.

As an alternative, it would be easier to provide you a subset of the tables, or CONNECT tables that are virtual tables (so joins would be slow).

@jcrespo, it looks like I could accomplish most of the same things if the centralauth database was replicated to x1-analytics-slave. Maybe we should just do that?

That is probably more reasonable, let me a couple of days to research sizes, load (and if there is some security issue).

May I ask you to add the minimum columns that you need (just ids and usernames)? The less data is sent, the more likely it is that I can provide it to you.

@jcrespo, good question, because now that I've looked at the contents of the database, I realize it won't actually do what I want. The echo databases use only local user IDs, and the central auth database uses only usernames.

So let me go back to thinking about other possibilities, and how important this request actually is.

jcrespo changed the task status from Open to Stalled.Feb 3 2016, 9:44 AM

@Catrope, this is a reminder to nudge Jaime on this ticket.

jcrespo removed jcrespo as the assignee of this task.Apr 22 2016, 4:13 PM
jcrespo claimed this task.Oct 20 2017, 10:33 AM

This was resolved on T175970.

jcrespo closed this task as Resolved.Oct 20 2017, 10:33 AM
Neil_P._Quinn_WMF raised the priority of this task from Normal to Needs Triage.Mar 30 2018, 10:31 AM
Neil_P._Quinn_WMF moved this task from Backlog to Radar on the Contributors-Analysis board.