Background
This project is about evolving the platform our production services run on. Our services are currently based on the outdated service-runner / service-template-node platform, which is a maintenance, development, and performance concern for us. We want to replace the platform, initially with a simpler, more performant one based on Node, and ultimately, one based on a more suitable runtime environment like Rust or Golang.
Approach
- We identify all the features of service-runner we are using
- We replace service-runner with service-utils or similar in our production services (still running Node)
- We create a wish-list of features we would like in the new service runner
- We decide on whether Rust or Go is the appropriate programming language to move to (in liaison with colleagues across the Foundation, per Selena. Work product is documenting rationale of choice)
- We prototype the initial build-out of the replacement service platform in the evaluator (still in Node)
- Begin porting the evaluator code, starting with the WASM executor interaction; a deliverable here would be a PoC demonstrating how we will run the executors in the new programming language
- Migrate the subset of tests that are not related to service-level concerns and make them pass in the port's repository.
Acceptance Criteria/Success Metrics
- We are confident we understand the necessary parts of the service platform
- Our production routing, logs, and metrics continue to work as expected
- Our overhead load time in Node services is reduced by 5%(?)
- We have decided whether to adopt Rust or Go as a new language for the backend services.
- We have discussed with SRE all needed changes to container-level protections and sandboxing for the evaluator service and have redesigned our WASI management accordingly.
- We have prototyped the WASI/executor interaction and are testing some Python/JS function calls in CI.
Stretch Goal
- At least a stub service could be exposed in production.