Investigate requirements for MySQL access in RESTBase
Closed, ResolvedPublic

Description

MediaWiki has a fairly intricate DB layer (ChronologyProtector, code for keeping slave lag low, DBPerformance warnings etc). If one wanted to interact with MariaDB from a RESTBase service (not necessarily with the same DBs as the wikis, but primary data with similar robustness requirements), to what extent would those features need to be reimplemented?

(Context: we are making plans for the new Reading List Service and it seems some parts of it would have to live in MediaWiki and some in RESTBase. We are trying to find out which one should do the DB handling.)

Tgr created this task.May 8 2017, 10:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2017, 10:35 PM

To be more concrete, the scope of this ticket is the Reading Lists functionality. As such, we are not asking whether a Node.JS service can directly read/write from/to the main DB/tables, but whether it can be done to a separate DB.

GWicke added a subscriber: GWicke.EditedMay 9 2017, 8:59 AM

MediaWiki does have a lot of features for working with MySQL at scale. However, not all of those features seem to be needed for the concrete reading list use case:

  • Data set size and query volume / complexity are projected to be very modest (< ~10/s). Based on this, sharding and read scaling using slaves should not be necessary.
  • All reads & writes are tied to a user account / session, which means that they will currently hit the primary DC only. Cross-DC replication consistency is not a concern for now.
  • Logging slow queries at the app level is easy on any platform that supports some form of concurrent timers.

That said, I agree that it is very much worth considering those questions for larger use cases, and for future active-active operation. If we use MySQL more widely as a backing store for services, then we should ensure that common DB management functionality is consistently available to those services. This could either be a port of needed features to a library, or the use of an off-the-shelf SQL proxy along the lines of proxysql or haproxy managing sharding, slave lag, and logging concerns.

For a "new" service with modest number of writes and no current concerns for cross-dc consistency, we can implement clustering on MySQL- that is not currently possible on mediawiki due to its dependency on async replication and other legacy code limitations. That will provide both load balancing and HA with configurable consistency (sync or asnync reads).

I have setup already some test clusters within our infrastructure and many production ones in the past, and is probably the future for mediawiki masters HA (once mediawiki and legacy limitations are fixed)- so it will certainly be supported in the future. Load balancing can be provided by the connector or transparently by redundant proxies (HAProxy or ProxySQL).

The above is storage-wise. Architecture wise (and I have no power here), using storage as a shared mechanism seems like an anti-pattern to me and encapsulation violation. Shouldn't a piece of code be in between so only one codepath controls what is going on, exposing and api, and consumers use that api? Shouldn't probably the platform team weight in here, because even if this is for an extension, it is potentially a model shift?

@jcrespo: I think we are all on the same page that a single service should exclusively own and access its storage. The concrete question is whether creating a stand-alone service managing reading lists & storing its data in MySQL is feasible, and how the costs & benefits of doing so compare to an implementation as a MediaWiki extension.

That is the part that I would ask to the platform team 0:-)

Tgr added a comment.May 15 2017, 1:46 PM

Come to think of it, MySQL access in RESTBase will be necessary anyway for push notifications, right?

Come to think of it, MySQL access in RESTBase will be necessary anyway for push notifications, right?

Not in RESTBase, but in the Push Notification service, which will be a Node.JS service. So yes, we will need a way to connect to MySQL from Node anyway.

@mobrovac @Tgr not to muddy the waters, but I have been doing some research in regards to the persistence for Push: T163116#3263679

MySQL may be the best fit here, but I am not sure yet that we should commit to the Push service also using MySQL until we do some more investigation.

Tgr removed Tgr as the assignee of this task.

We decided to handle DB access on the MediaWiki side for reading lists (a long time ago, but I forgot to update this).
Feel free to close this task or repurpose it into a generic RESTBase issue.

Pchelolo closed this task as Resolved.Dec 6 2017, 3:00 AM
Pchelolo edited projects, added Services (done); removed Services (watching).
Pchelolo claimed this task.
Pchelolo added a subscriber: Pchelolo.

I don't think we need it in the mid-term, I'll decline the ticket, we can always reopen it if needed