In T307142 GitLab was migrated to new physical machines. Currently we have a total of four machines. One is used for production GitLab and two for GitLab replicas in both DCs. We don't need two replicas for the current setup, this is more due to historic reasons (T296713). So we have two spare machines which could be used to improve the availability and reliability of GitLab. This task should define the future architecture and define usage for the two GitLab machines gitlab1003 (old replica) and gitlab2003 (insetup).
GitLab Omnibus for up to 1000 users
GitLab has multiple reference architectures for different use cases. Currently we use the reference architecture for up to 1000 users. This is is a single host called omnibus setup. All needed services are running on a single machine and are managed by GitLab Debian package. So maintenance and backup is quite easy. The downside is that for every update GitLab needs to be restarted and is not available during that time. Furthermore switching between instances in case of an incident (failover) takes roughly 1 to 2 hours and is a manual process.
Side note: we are using much bigger machines than the suggested 8 vCPU and 7.2 GB memory. We are running nodes with 20 physical cores and 128GB of RAM.
See also resource utilization of GitLab nodes: https://grafana.wikimedia.org/d/R_1IvBZnz/gitlab-omnibus-overview?orgId=1&refresh=1m&var-node=gitlab1004
The only resource we are utilizing a lot is disk space. Sometimes memory is saturated while doing backups too.
GitLab HA for 1000+ users
The next bigger architecture is designed for 2000 users and consists of around 7 different nodes and doesn't offer HA. The 3000 user reference architecture offers HA but also needs dozens of nodes/machines. For smaller installations some node counts can be scaled down, which is still more than 10 nodes.
Using remaining hosts for different purpose
At the moment maintenance downtimes are quite short and happen roughly once a moth for the current setup. Furthermore we scaled the existing omnibus setup vertically quite a lot and I'd estimate it can serve more than 1000 users. It also seems that GitLab HA requires a lot of different nodes which also need maintenance and configuration. So it should we discussed if there is need for high availability for now and the future.
If we come to the conclusion that a single host setup (omnibus) and a replica is enough for now, we can also think about using the remaining two hosts for different purposes than HA in the GitLab realm. For example:
- as dedicated backup nodes as discussed in T274463#8118962
- as a GitLab mirror T291322
- as additional Trusted Runners