Jenkins lives on a single production machine in eqiad (contint1001.eqiad.wmnet) which is itself a single point of failure and does not let us switch to codfw in case of issue on the eqiad datacenter. We have no good way to test a Jenkins upgrade and rolling back an upgrade would be challenging.
+ hotspare or active/active
+ production
+ one less SPOF
+ let us test Jenkins upgrades (eg T144106)
Setting up a secondary production machine hosting a Jenkins in codfw will address the points. We can have it as an hot spare and use it for testing upgrade and later on head toward an active/active setup.
Rough notes from @hashar and @thcipriani meeting:
New production machine in Dallas
Get a new production machine in Dallas ( cont2001.codfw.wmnet ?). We will want to request hardware allocation. Note: we need a Public IP address.
- Fill a procurement ticket against hardware-requests see doc there. --> T150865
- server is contint2001.wikimedia.org
puppet work has to be done:
- review classes and find out whether they are fully configurable via hiera
- identify potentially hardcoded IP / hostname
- Review firewall rules
Culpirts
- Jenkins job would have to be run on both instances to have jobs in sync
- Timed jobs (browsertests, beta cluster jobs, doc publishing) would run on both instance of Jenkins! A solution has to be figured out for that use case.
- Not sure how Zuul will craft a report URL pointing to proper jenkins master
Future
Both Nodepool and Zuul can be connected to multiple Jenkins master if we aim at an active/active setup.
Later on we might want to add a spare Zuul to the new machine.