In T365259 it was discussed to move Gerrit behind the CDN/loadbalancer for better anti-abuse handling. It was discussed that GitLab may be a better first candidate because the setup is quite similar (one web service and one ssh service) but GitLab is less production-critical and has more test instances available.
Most discussion already happened in T365259 which also holds up for GitLab. So this task is mostly for discussing and tracking the actual technical implementation of anti-abuse hanlding for GitLab.
GitLab consits of multiple machines and services:
- GitLab production
- web service at https://gitlab.wikimedia.org
- ssh service at gitlab.wikimedia.org:22
- GitLab replica
- web service at https://gitlab-replica.wikimedia.org
- ssh service at gitlab-replica.wikimedia.org:22
- GitLab replica old (2nd replica)
- web service at https://gitlab-replica-old.wikimedia.org
- ssh service at gitlab-replica-old.wikimedia.org:22
This services are not related or distributed in any way. The replicas are standby machines which can be used for emergency switchovers and testing. They run an actual GitLab instance with old (12h) data but this instances are not used for the production GitLab.
Technical exploration
A first exploration was done with LVS. During this exploration, we considered placing GitLab behind the load balancing infrastructure. However, it was determined that this approach would involve significant refactoring and implementation work and overhead, especially since our primary interest is in throttling capabilities. Consequently, this idea was set aside.
We then explored alternative methods of throttling and traffic shaping. Utilizing local tools, such as a separate HAProxy or a firewall, seemed the most promising. Currently, we are testing and implementing throttling using firewall rules. The newer tool nftables offers built-in features that are particularly useful. Therefore, our initial step is to migrate to nftables and then verify potential configurations with it:
- Upgrade GitLab hosts to nftables
- Verify that throttling and dynamic IP sets for a denylist are possible in nftables
- Puppetize a basic set of rules to throttle external HTTP traffic
- Adjust thresholds
- enable rules on all instances
-
test-instanceno nftables so far because cloud cumin uses ferm - replica-b
- replica-a
- production
-
- (Repeat for SSH traffic)?
- update docs and publish tech news article