Page MenuHomePhabricator

codfw: 1 hardware access request for continuous integration
Closed, ResolvedPublic

Description

Site/Location: CODFW
Number of systems: 1
Service: continuous integration
Networking Requirements: 1 external IP, public VLAN
Processor Requirements: 4 CPU+
Memory: 16GB+
Disks: 1 TB
NIC(s): ...
Partitioning Scheme: same as contint1001. 50 GB / , rest on /srv/ . RAID1
Other Requirements:

Jenkins is a SPOF , only runs on contint1001 and we have no way to validate an upgrade for production.

We would like to setup a clone of contint1001 (eqiad) in the codfw datacenter. At first it will just host Jenkins as spare machine. Can then be used to validate an upgrade of Jenkins and be used in an active/passive setup.

We might add that new Jenkins instance as a co master to have a master/master setup. That will make CI more resilient.

The machine will need a public IP and be in the public VLAN (rational on the ticket for contint1001 T140257).

Event Timeline

contint1001 is a rather large machine and I am not aware of what is available in codfw. If we can match the spec that might simplify things later on, else I am sure we can afford to have a less powerful server for codfw.

Dzahn awarded a token.Nov 16 2016, 5:17 PM
Dzahn added a comment.Nov 16 2016, 5:42 PM

looking at DNS, contint1001 is WMF4746

in racktables that gets us

https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=2963

HW type: Dell PowerEdge R430

and it links us to the procurement task (nice! i had not noticed that in racktables before)

T130738

so it has a Xeon E5-2640 and 64GB RAM and looking at a google doc with spares in codfw i see that for example:

WMF6404

is listed as spare and is also a R430 with E5-2640 and 64GB RAM

does this make sense @RobH? @Papaul?

Dzahn triaged this task as Medium priority.Nov 16 2016, 6:23 PM
RobH assigned this task to mark.Nov 16 2016, 8:36 PM
RobH added a subscriber: mark.

contint1001 has Dual Intel® Xeon® Processor E5-2640 v3 (2.6GHz/8c), dual 1TB SFF SATA.

So WMF6404 has Dual Intel® Xeon® Processor E5-2640 (2.60/8cores), 64GB of memory, far more than this requirement. It also has dual 1TB SFF SATA. It was purchased on T130743 in March of 2016, so its sat for spare allocation awhile.

We'll need @mark to approve this allocation, and assign it back to me so I can update our spares tracking sheet and start the implementation.

@RobH pointed out contint1001 does not use SSD and that might be an IO bottleneck later on. gallium had suffered from slow I/O since it upgraded to Precise, with the RAID rebuild that mostly solved the issue. contint1001 has apparently even faster disk and ample memory which help for disk cache. Seems we do not need SSD for Jenkins.

The other use case is for the zuul-merger daemon. It does git operations which are naturally disk bound (clone, reset, checkout, merge attempt). Right now there is a single instance on scandium which is taking advantage of SSD. However we can multiply the number of instance and have a couple on each of the contint servers colocated with Jenkins. We will loose a bit of I/O but get parallelism with 2 zuul-merger on contint1001 and 2 others on this new server.

In short, there is no need for SSD.

mark added a comment.Dec 15 2016, 5:50 PM

Approved.

RobH closed this task as Resolved.Dec 15 2016, 7:13 PM

With the creation/processing of the setup task T153350, this hardware request is fulfilled.

Thank you !