Page MenuHomePhabricator

Prepare for DBA meeting
Closed, ResolvedPublic

Event Timeline

  • Should we use some shared MariaDB within WMF or should we host our own container?
  • Should every Pronlex image own it's own MariaDB within the same container, or should we have a shared MariaDB container?

As for scaling within the k8s cluster, is there some infrastructure around to let the cluster know when running instances are idle or really busy, in order to shutdown or start more instances? MaryTTS is without a doubt the service that will consume the most resources. We have an internal HAProxy to queue requests. Could that be replaces with configuration internals of the k8s cluster?

An introduction we could send:

We, WMSE, are in the process of moving our project Wikispeech https://www.mediawiki.org/wiki/Extension:Wikispeech towards beta deployment in the WMF infrastructure.

The Wikispeech extension is powered by a set of services that does the actual text-to-speech. One of these services, Pronlex, our phonetic lexicon, depends on a MySQL/MariaDB. Currently the data contained within is static, but soon enough it'll be populated with community improvements. We're wondering what database deployment you prefer we use. Here are the choices we can think of:

  • An already existing shared high availability MariaDB operated by WMF. This is the preferred solution for us.
  • We deploy our own MariaDB instance (or high availability cluster) in the k8s cluster.
  • We install a MariaDB in the Pronlex docker image. If we want to be able to horizontally scale the backend on demand in the k8s, then new data needs to be populated to these instances in some way. This will probably cripple our application and force us to develop more services to handle that problem.

Since we're touching the subject of k8s, we have a couple of questions about that too.

  • Some of our services are rather heavy on the CPU in bursts. If we hit heavy load we'll have to scale up. Will k8s do automatic horizontal scaling of the pods, or will we need to help it out in some way?
  • One of our services, Mary-TTS, make use of all threads it has access to. To avoid slowing down the service by spawning more threads and thus slowing down the parallel computing, we've installed a HAProxy-front in our Docker image to queue all requests. Is this something we could configure k8s to do for us instead?
  • Should we use some shared MariaDB within WMF or should we host our own container?
  • Should every Pronlex image own it's own MariaDB within the same container, or should we have a shared MariaDB container?

If these DBs needs to be shared between services then I expect the DBAs will expect this to use one of the provided and shared db solutions https://wikitech.wikimedia.org/wiki/MariaDB#Sections_and_shards
Another question is, is the db per wiki? or shared across all deployed wikis?

As for scaling within the k8s cluster, is there some infrastructure around to let the cluster know when running instances are idle or really busy, in order to shutdown or start more instances? MaryTTS is without a doubt the service that will consume the most resources. We have an internal HAProxy to queue requests. Could that be replaces with configuration internals of the k8s cluster?

As far as I am aware the current k8s cluster usage does not have autoscaling. the expectations and requirements of capacity will be figured out I guess as part of T264752: Contact service ops regarding deployment of Speechoid.

As for this HAProxy thing, that sounds like an interesting topic to discuss, but perhaps dosn't belong in this ticket.