Remove servicerunner dependency for cxserver
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	santhosh
	Feb 20 2024, 5:12 AM

Description

Servicerunner, the base library for most of WMF's nodejs based services is unmaintained. Service runner, and its unmaintained dependencies are causing many issues:

Security issues
Performance issues - for example, the preq based http requests are no longer required as node natively has these features
Coding standards - For examples, Promises are native to node these days.

Since cxserver is deployment is based on kubernetes and not baremetal deployments as used to happen many years back, a standard nodejs based service is sufficient.

It should have standard logging(ECS based) and statsd analytics reporting as well.

A node cluster management system such as pm2, nest.js or even just node's native node cluster is required to start/kill threads as required.

Details

	Subject	Repo	Branch	Lines +/-
	WIP: Migrate from service-runner to node cluster	mediawiki/services/cxserver	master	+1 K -1 K

Customize query in gerrit

Related Objects

Mentioned In: T309772: npm audit reports several security issues with Service runner
Mentioned Here: T350773: Remove preq and use node fetch

Event Timeline

santhosh created this task.Feb 20 2024, 5:12 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 20 2024, 5:12 AM

KartikMistry added a subscriber: akosiaris.Feb 20 2024, 5:31 AM

Change 1003609 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] WIP: Migrate from service-runner to node cluster

https://gerrit.wikimedia.org/r/1003609

gerritbot added a project: Patch-For-Review.Feb 20 2024, 11:40 AM

akosiaris mentioned this in T309772: npm audit reports several security issues with Service runner.Feb 20 2024, 11:59 AM

The above patch is a quick run to identify the required efforts to migrate from servicerunner. It is not for merge. My proposal is to modernize various parts of cxserver, while using servicerunner as process manager. Do this migrations in iterations and at later stage when cxserver does not have a strong dependency on servicerunner other than a process manager, replace it. Doing everything in one go is too risky as cxserver is the backbone of our heavily used translation system.

Plan

Replace preq library with native http requests T350773: Remove preq and use node fetch
Simplify the routers. The current approach is reading all files in routes directory, and loading the routes. Instead, use express's recommended way of handling routes - import the routes definitions and register using app.use('/', rootRoutes) ;app.use('/v1', v1Routes); etc
Replace the complex testing orchestration with standard express framework testing.
Use an ECS logger https://github.com/elastic/ecs-logging-nodejs. Remove custom error logging and request logging with ECS express middleware.
Remove various methods of stats reporting(statsd and Prometheus) and use one and only one way of reporting stats using a statsd client
Simplify the configuration - Current configuration assumes there are multiple services in the repo(service runner was designed for this kind of setup).
And finally replace servicerunner with node cluster

Remove various methods of stats reporting(statsd and Prometheus) and use one and only one way of reporting stats using a statsd client

My understanding was that statsd was deprecated and we are supposed to use Prometheus instead: https://wikitech.wikimedia.org/wiki/Statsd "Statsd is supported (on the production realm), but new metrics producers are encouraged to use Prometheus instead."

Remove servicerunner dependency for cxserverOpen, Needs TriagePublicActions