Page MenuHomePhabricator

Remove servicerunner dependency for cxserver
Open, Needs TriagePublic

Description

Servicerunner, the base library for most of WMF's nodejs based services is unmaintained. Service runner, and its unmaintained dependencies are causing many issues:

Since cxserver is deployment is based on kubernetes and not baremetal deployments as used to happen many years back, a standard nodejs based service is sufficient.

It should have standard logging(ECS based) and statsd analytics reporting as well.

A node cluster management system such as pm2, nest.js or even just node's native node cluster is required to start/kill threads as required.

Event Timeline

Change 1003609 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] WIP: Migrate from service-runner to node cluster

https://gerrit.wikimedia.org/r/1003609

The above patch is a quick run to identify the required efforts to migrate from servicerunner. It is not for merge. My proposal is to modernize various parts of cxserver, while using servicerunner as process manager. Do this migrations in iterations and at later stage when cxserver does not have a strong dependency on servicerunner other than a process manager, replace it. Doing everything in one go is too risky as cxserver is the backbone of our heavily used translation system.

Plan

  • Replace preq library with native http requests T350773: Remove preq and use node fetch
  • Simplify the routers. The current approach is reading all files in routes directory, and loading the routes. Instead, use express's recommended way of handling routes - import the routes definitions and register using app.use('/', rootRoutes) ;app.use('/v1', v1Routes); etc
  • Replace the complex testing orchestration with standard express framework testing.
  • Use an ECS logger https://github.com/elastic/ecs-logging-nodejs. Remove custom error logging and request logging with ECS express middleware.
  • Remove various methods of stats reporting(statsd and Prometheus) and use one and only one way of reporting stats using a statsd client
  • Simplify the configuration - Current configuration assumes there are multiple services in the repo(service runner was designed for this kind of setup).
  • And finally replace servicerunner with node cluster
  • Remove various methods of stats reporting(statsd and Prometheus) and use one and only one way of reporting stats using a statsd client

My understanding was that statsd was deprecated and we are supposed to use Prometheus instead: https://wikitech.wikimedia.org/wiki/Statsd "Statsd is supported (on the production realm), but new metrics producers are encouraged to use Prometheus instead."