Problem
Over the last year, the Toolforge ecosystem has evolved beyond 'jobs' and 'webservice' to include the new build system, featuring a system for environment variables and secrets, a potential deploy subcommand, and more. Now might be a good time to take a step back and reassess the architectural foundation that supports these functionalities. As we continue to expand and add new features, it's crucial to ensure that our architecture is scalable, maintainable, and aligned with both our long-term vision and immediate operational needs.
References:
T342077: Toolforge beyond build service
Toolforge work group meeting notes
Decision Record
Risks & Constraints
...if the architecture _Doesn't Evolve_:
- Increased Complexity: The architecture could become increasingly complex, making it harder to manage, maintain, and onboard new contributors.
- Resource Inefficiency: The current architecture might require more resources for maintenance than a new, more efficient architecture, leading to wasteful allocation of engineering time
- Community Disengagement: The existing complexity may deter new contributions.
...if the architecture _Does Evolve_:
- Implementation Challenges: Transitioning to a new architecture could be resource-intensive (engineering-time wise) and meet resistance.
- Backward Compatibility: Changes must consider the impact on existing services, posing a risk of breaking functionalities or affecting the user experience
- Operational Overhead: Unexpected complexities in deployment and monitoring may arise, requiring more operational effort.
Options
Some pros and cons of each option have been listed. For further context and in-depth discussion of the finer points of the different options, see the references linked above.
Option 1
Backend(API gateway + several per-service APIs) + Client(Single codebase/monolith)
Pros:
- Decoupling between frontend and backend through API gateway.
- Simpler to improve user client experience with a single package
- Client easier to do big contributions (all code together, shipped as one)
- Backend easier to do small contributions (easier to understand/test/depoly just one small part)
- Increased flexibility and scalability
- Backend easy to move to a monolithic system
- Backend easier for others to reuse outside toolforge
Cons:
- Client hard to move to a decoupled system
- Client harder to do small contributions (must test all flows for any change, must understand the whole system)
- Backend harder to do big contributions (split repos, split deployments)
- Potential for backend operational complexity due to multiple service APIs.
- API gateway introduces an additional system to maintain
- Backend split repos could end up in a high degree of code repetition and boilerplate.
- Client harder for others to reuse outside toolforge
Option 2
Backend(Single API service with all services in it) + Client(Single codebase/monolith)
Pros:
- Highly integrated and simplified operation/deployment
- Simpler to improve user client experience with a single package
- All easier to do big contributions (all code together, shipped as one)
- Remove API gateway system (functionality must be re-written in the API though)
Cons:
- All hard to move to a decoupled system
- All harder to do small contributions (must test all flows for any change, must understand the whole system)
- Reduced flexibility and scalability
- Tight coupling could make future changes more challenging
- All harder for others to reuse outside toolforge
Option 3
Backend(API gateway + per-service APIs) + Client(Per-service codebases) - Status Quo
Pros:
- Decoupling between frontend and backend through API gateway.
- Existing familiarity and no immediate changes required
- High degree of decoupling allows for services to be managed independently from development to deployment
- All easier to do small contributions (easier to understand/test/deploy just one small service)
- All easy to move to a monolithic system
- All easier for others to reuse outside toolforge
Cons:
- All harder to do big contributions (split repos, split deployments)
- More complex to improve client experience with a single package
- Potential for backend operational complexity due to multiple service APIs.
- Harder to reason about the system as a whole
- All split repos could end up in a high degree of code repetition and boilerplate.
Option N
- Add your option here!
Note for Future Decisions
Specifics such as the "slim/smart" nature of the API gateway and inter-service communications, moving the CLIs to Go or not etc., will be left for a second round of decisions. This allows us to focus on immediate architectural choices first.
Questions for Consideration
- How do each of these options align with our pre-defined goals and criteria?
- What impact will the chosen architecture have on team communication and development practices?
Criteria for Evaluation
Non-exhaustive list, in no particular order. Feel free to add your own.
- Reducing Complexity: How well does the option simplify the overall architecture and make it easier to manage and reason about?
- Scalability: How well can the architecture handle increasing user demand and feature expansion?
- Maintainability: What level of effort is required to maintain the system, including bug fixes, updates, and adding new features?
- Team Alignment: Does the architecture align with the team's structure and practices, allowing for effective collaboration and contribution?
- Ease of Installation: How straightforward is it to install the system, both for developers and end-users?
- Ease of Contribution: How accessible is the system for new contributors?
- User Experience: Does the architecture facilitate a seamless and efficient experience for the end-users?
- Operational Complexity: What is the impact of the architecture on deployment, monitoring, and logging?
- Iterative Development: How well does the architecture support iterative development and adjustments over time?