Define persistence backends for Push Notification Service
Closed, ResolvedPublic
Actions

Description

We're provisionally going to use MySQL for persistence. Both for the MediaWiki extension and for the push service. They're probably two separate database schemas, one for MediaWiki and one for MISC, but this still needs to be determined.

Here's the old description of this task for posterity's sake:

Before developing the Push Service, a type of DB and its location need to be chosen. Currently, MySQL looks to be most likely, but before committing, some more research should be performed on using Cassandra or another DB (like HBase) that may be a better fit (and no single point of failure) as laid out here:
https://www.mediawiki.org/wiki/RESTBase/Table_storage_backend_options

If MySQL is chosen, after discussing internally on the #reading-infrastructure-team, primarily with @Tgr , we have the following likely possibilities:

Use a new DB cluster on the standard DB servers.
Use one of the existing non-mediawiki DB clusters (m1-m5, which are used for all kinds of stuff from Phabricator to the scholarship review app)
use a cross-wiki MediaWiki DB cluster (x1, which is currently mostly used by Flow, or just stick a DB into one of the clusters which are normally used for per-wiki tables - e.g. the CentralAuth database is hosted on s7)

We also have 2 other possibilities but think they are less desirable/likely:

use a separate database server
use wiki-specific tables (s1-s7, one table per wiki)

So far, hosting this on s7 (option 3) seems like the best option. Why?

It gives us the options to link devices to user accounts if we ever decide to do that.
It is non-wiki specific (and devices and browsers are also non-wiki specific)
We don't have to setup a new DB (not terrible, but more work)

So my questions:

Did we miss any options?
Are there any downsides/upsides we missed to any of the options?
Does this logic seem reasonable?

I haven't done so yet, but will be updating the technical plan with this info shortly:

https://docs.google.com/document/d/1nVW3XC7PO3XDKRz72fCFDoZ-CA_lrhzNekUldPn5wFA/edit?pli=1#heading=h.xanehm3k39vh

Note:
This assumes MySQL which was chosen over Cassandra for reasons listed in the document above

Related Objects
Search...

Status	Assigned	Task
Resolved	LGoto	T146032 Define and implement a common push notification architecture
Resolved	• Mholloway	T163116 Define persistence backends for Push Notification Service
Resolved	MSantos	T246716 DBA review for Echo push notification subscription tables
Resolved	• Mholloway	T251431 Create .sql for Push Notifications tasks

Event Timeline

• Fjalapeno created this task.Apr 17 2017, 3:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 17 2017, 3:41 PM

• Fjalapeno added a project: Reading Epics (Push Notifications).Apr 17 2017, 3:41 PM

• Fjalapeno updated the task description. (Show Details)Apr 17 2017, 3:43 PM

• Fjalapeno updated the task description. (Show Details)Apr 17 2017, 3:49 PM

• Fjalapeno moved this task from Backlog to Project Setup on the Push-Notification-Service board.

• Fjalapeno edited projects, added Push-Notification-Service (Project Setup); removed Push-Notification-Service.

Not putting it on the standard DB servers is the way to go, IMHO. The service is very WMF-specific (for now) and with a specific purpose which doesn't necessarily depend on the individual projects, so its DB shouldn't follow the project semantics. Whether to locate it on s7 or a new DB cluster is a question that should be discussed with Jaime and Manuel as it depends on s7's current setup and the projected bandwidth / traffic the service will have. Additionally, because you will have the need to replicate some data between the main DBs and yours, there will need to be a mechanism put in place for that.

@mobrovac to be clear are you saying "option 4"?

Also on the replication side, can you clarify a bit. I am not a DBA… I was figuring the service could connect directly to the DB, but you seem to be suggesting that we will need to replicate the DB locally. I am guessing the latency between the service and the DB server is too great?

In T163116#3186669, @Fjalapeno wrote:

@mobrovac to be clear are you saying "option 4"?

I'm saying that it is not clear to me at this point which of the two options (3 or 4) would be better and that needs to be discussed with our DBAs.

Also on the replication side, can you clarify a bit. I am not a DBA… I was figuring the service could connect directly to the DB, but you seem to be suggesting that we will need to replicate the DB locally. I am guessing the latency between the service and the DB server is too great?

Because of various reasons (performance, security, ...) you are not allowed to directly query the main DB from outside MW (and even there you need to use wrappers). That means you will need to have a way to replicate users's information between the main DB and yours. Here again the input of the DBAs will be invaluable. The s7 DB is already being kept in sync with the main DB, so there might be a solution readily available that could be reused. If not, you will likely need a maintenance / cron script to keep your copy up-to-date.

@mobrovac thanks for the input… makes sense… going to DBAs/Ops next, just wanted to sanity check with you guys first

Using s7 would be convenient because it has global user/watchlist data so it would be possible to do a join between watchlists and push subscription lists, which seems like the most performant way to handle watchlist push notifications. (The same thing could be achieved by setting up another replication stream for watchlist, but why do it twice?)

Security-wise, there doesn't seem to be too much difference between a service that can access s7 with a user that can only read global watchlists and its own table, and a service using a less sensitive cluster with global watchlists getting replicated there. (Access to the global user table would be scary, since that stores passwords and cookie tokens and whatnot, but I don't think that table is needed for anything.)

Tgr mentioned this in T164990: RfC: Reading List service.May 15 2017, 12:48 PM

• Mholloway subscribed.May 15 2017, 2:13 PM

Restricted Application added a project: Product-Infrastructure-Team-Backlog-Deprecated. · View Herald TranscriptMay 15 2017, 2:13 PM

With all the discussion of persistence for the Reading List Service on T164805: Investigate requirements for MySQL access in RESTBase, I started looking into persistence again here.

After some initial discussions we essentially made the case that Cassandra was not a good fit for the needs of the Push service. After a little more poking around I wonder if is there a potential design for the data that would make Cassandra a better fit avoiding the need to solve how to use MySQL for Node applications.

Some things that got me thinking:

We assume that we need to perform large range queries and that they will be inefficient. But is this correct? Could we structure the tables or index them differently to make them more efficient? At what size do they become inefficient?
We do need to know what devices are subscribed to a particular topic. but we probably don't ever need to need to know all subscriptions to which a device is subscribed
Push by nature is "best effort" and even the push providers do not guarantee that all notifications reach users, or make any time service commitments. So eventual consistency is not a problem
All clients still need to support non-push polling in the case of a user not enabling push
Urban Airship, one of the largest push BAAS in the world, uses Cassandra as the backend: https://en.wikipedia.org/wiki/Apache_Cassandra#Prominent_users

MySQL may still be a better fit, but we should probably dig a little deeper.

It may be good to investigate Urban Airship's use case specifically. Their architecture, which they call the Core Delivery Pipeline (CDP) is of particular interest. Of note, they are also using Kafka as a messaging queue for propagating the notifications to the Push provider services. However, they are using both Cassandra and HBase depending on the use case:
https://www.urbanairship.com/blog/how-we-scaled-to-2.5-billion-mobile-notifications-during-us-election

urban-airship-core-delivery-high-level-architecture.png (677×1 px, 311 KB)

• Fjalapeno renamed this task from Decide on persistance location for Push Notification Service to Decide on persistence backend and location for the Push Notification Service.May 15 2017, 3:37 PM

• Fjalapeno updated the task description. (Show Details)

• Fjalapeno mentioned this in T164805: Investigate requirements for MySQL access in RESTBase.

• Fjalapeno moved this task from Needs triage to Backlog on the Product-Infrastructure-Team-Backlog-Deprecated board.Jun 7 2017, 3:28 PM

• GWicke triaged this task as Medium priority.Aug 8 2017, 9:15 PM

• mobrovac added a project: Platform Team Legacy (Designing).Dec 20 2018, 12:55 PM

doesn't seem to be needed anymore.

@Pchelolo I may have missed it but has a standardized persistence mechanism been defined?

Push Notifications is on the Product Infrastructure roadmap for approximately Q3 hands on implementation.

Adding @Jhernandez for follow on.

We are going to have to re-plan how to approach that goal, so will have discussions in other venues I think.

I'd prefer to keep this open for further discussion, or at least until any superseding tasks are created. As @dr0ptp4kt mentioned, this will be a major focus for Product Infrastructure this FY.

If it's no longer relevant to CPT, feel free to un-tag.

• Mholloway added a parent task: T146032: Define and implement a common push notification architecture.Sep 10 2019, 8:41 PM

• Jhernandez unsubscribed.Apr 2 2020, 6:46 PM

dr0ptp4kt renamed this task from Decide on persistence backend and location for the Push Notification Service to Define persistence backends for Push Notification Service.Apr 29 2020, 5:20 PM

dr0ptp4kt updated the task description. (Show Details)

dr0ptp4kt added a subtask: T246716: DBA review for Echo push notification subscription tables.Apr 29 2020, 5:23 PM

MSantos closed subtask T246716: DBA review for Echo push notification subscription tables as Resolved.Oct 1 2020, 10:44 AM

• Mholloway closed this task as Resolved.Oct 6 2020, 5:56 PM

• Mholloway claimed this task.