Page MenuHomePhabricator

Decide on persistence backend and location for the Push Notification Service
Open, NormalPublic

Description

Before developing the Push Service, a type of DB and its location need to be chosen. Currently, MySQL looks to be most likely, but before committing, some more research should be performed on using Cassandra or another DB (like HBase) that may be a better fit (and no single point of failure) as laid out here:
https://www.mediawiki.org/wiki/RESTBase/Table_storage_backend_options

If MySQL is chosen, after discussing internally on the Product-Infrastructure-Team-Backlog, primarily with @Tgr , we have the following likely possibilities:

  1. Use a new DB cluster on the standard DB servers.
  2. Use one of the existing non-mediawiki DB clusters (m1-m5, which are used for all kinds of stuff from Phabricator to the scholarship review app)
  3. use a cross-wiki MediaWiki DB cluster (x1, which is currently mostly used by Flow, or just stick a DB into one of the clusters which are normally used for per-wiki tables - e.g. the CentralAuth database is hosted on s7)

We also have 2 other possibilities but think they are less desirable/likely:

  1. use a separate database server
  2. use wiki-specific tables (s1-s7, one table per wiki)

So far, hosting this on s7 (option 3) seems like the best option. Why?

  • It gives us the options to link devices to user accounts if we ever decide to do that.
  • It is non-wiki specific (and devices and browsers are also non-wiki specific)
  • We don't have to setup a new DB (not terrible, but more work)

So my questions:

  • Did we miss any options?
  • Are there any downsides/upsides we missed to any of the options?
  • Does this logic seem reasonable?

I haven't done so yet, but will be updating the technical plan with this info shortly:

https://docs.google.com/document/d/1nVW3XC7PO3XDKRz72fCFDoZ-CA_lrhzNekUldPn5wFA/edit?pli=1#heading=h.xanehm3k39vh

Note:
This assumes MySQL which was chosen over Cassandra for reasons listed in the document above

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 17 2017, 3:41 PM
Fjalapeno updated the task description. (Show Details)Apr 17 2017, 3:43 PM

Not putting it on the standard DB servers is the way to go, IMHO. The service is very WMF-specific (for now) and with a specific purpose which doesn't necessarily depend on the individual projects, so its DB shouldn't follow the project semantics. Whether to locate it on s7 or a new DB cluster is a question that should be discussed with Jaime and Manuel as it depends on s7's current setup and the projected bandwidth / traffic the service will have. Additionally, because you will have the need to replicate some data between the main DBs and yours, there will need to be a mechanism put in place for that.

@mobrovac to be clear are you saying "option 4"?

Also on the replication side, can you clarify a bit. I am not a DBA… I was figuring the service could connect directly to the DB, but you seem to be suggesting that we will need to replicate the DB locally. I am guessing the latency between the service and the DB server is too great?

@mobrovac to be clear are you saying "option 4"?

I'm saying that it is not clear to me at this point which of the two options (3 or 4) would be better and that needs to be discussed with our DBAs.

Also on the replication side, can you clarify a bit. I am not a DBA… I was figuring the service could connect directly to the DB, but you seem to be suggesting that we will need to replicate the DB locally. I am guessing the latency between the service and the DB server is too great?

Because of various reasons (performance, security, ...) you are not allowed to directly query the main DB from outside MW (and even there you need to use wrappers). That means you will need to have a way to replicate users's information between the main DB and yours. Here again the input of the DBAs will be invaluable. The s7 DB is already being kept in sync with the main DB, so there might be a solution readily available that could be reused. If not, you will likely need a maintenance / cron script to keep your copy up-to-date.

@mobrovac thanks for the input… makes sense… going to DBAs/Ops next, just wanted to sanity check with you guys first

Tgr added a comment.Apr 17 2017, 9:19 PM

Using s7 would be convenient because it has global user/watchlist data so it would be possible to do a join between watchlists and push subscription lists, which seems like the most performant way to handle watchlist push notifications. (The same thing could be achieved by setting up another replication stream for watchlist, but why do it twice?)

Security-wise, there doesn't seem to be too much difference between a service that can access s7 with a user that can only read global watchlists and its own table, and a service using a less sensitive cluster with global watchlists getting replicated there. (Access to the global user table would be scary, since that stores passwords and cookie tokens and whatnot, but I don't think that table is needed for anything.)

With all the discussion of persistence for the Reading List Service on T164805: Investigate requirements for MySQL access in RESTBase, I started looking into persistence again here.

After some initial discussions we essentially made the case that Cassandra was not a good fit for the needs of the Push service. After a little more poking around I wonder if is there a potential design for the data that would make Cassandra a better fit avoiding the need to solve how to use MySQL for Node applications.

Some things that got me thinking:

  1. We assume that we need to perform large range queries and that they will be inefficient. But is this correct? Could we structure the tables or index them differently to make them more efficient? At what size do they become inefficient?
  2. We do need to know what devices are subscribed to a particular topic. but we probably don't ever need to need to know all subscriptions to which a device is subscribed
  3. Push by nature is "best effort" and even the push providers do not guarantee that all notifications reach users, or make any time service commitments. So eventual consistency is not a problem
  4. All clients still need to support non-push polling in the case of a user not enabling push
  5. Urban Airship, one of the largest push BAAS in the world, uses Cassandra as the backend: https://en.wikipedia.org/wiki/Apache_Cassandra#Prominent_users

MySQL may still be a better fit, but we should probably dig a little deeper.

It may be good to investigate Urban Airship's use case specifically. Their architecture, which they call the Core Delivery Pipeline (CDP) is of particular interest. Of note, they are also using Kafka as a messaging queue for propagating the notifications to the Push provider services. However, they are using both Cassandra and HBase depending on the use case:
https://www.urbanairship.com/blog/how-we-scaled-to-2.5-billion-mobile-notifications-during-us-election

Fjalapeno renamed this task from Decide on persistance location for Push Notification Service to Decide on persistence backend and location for the Push Notification Service.May 15 2017, 3:37 PM
Fjalapeno updated the task description. (Show Details)
GWicke triaged this task as Normal priority.Aug 8 2017, 9:15 PM
Pchelolo closed this task as Declined.Jul 31 2019, 9:26 PM

doesn't seem to be needed anymore.

@Pchelolo I may have missed it but has a standardized persistence mechanism been defined?

Push Notifications is on the Product Infrastructure roadmap for approximately Q3 hands on implementation.

Adding @Jhernandez for follow on.

We are going to have to re-plan how to approach that goal, so will have discussions in other venues I think.

Mholloway reopened this task as Open.Tue, Sep 10, 8:30 PM

I'd prefer to keep this open for further discussion, or at least until any superseding tasks are created. As @dr0ptp4kt mentioned, this will be a major focus for Product Infrastructure this FY.

If it's no longer relevant to CPT, feel free to un-tag.