Page MenuHomePhabricator

eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet
Closed, DeclinedPublic

Description

Project: Continuous-Integration-Infrastructure
Site/Location: EQIAD
Number of systems: 1
Service: Continuous integration (target for Jenkins Zuul migration)

scandium.eqiad.wmnet currently has two 160GBytes SSD: Intel SSDSA2M160 (sata, 2.5 inches, multi-level cells, 160GB).

We are going to migrate the CI systems (Jenkins/Zuul) from gallium to scandium and would need more disk space to host the new software and most importantly the Jenkins build history and artifacts.

The partitions are using LVM with:

PartitionsRaidMount pointSize
sda1 sdb1raid 1 - md0/10 GiB
sda2 sdb2raid 1 - md1[SWAP]1 GiB
sda3 sdb3raid 1 - md2/srv/ssd149 GiB

300 G disks would be ideal and give us enough room. To accomodate for Jenkins / Zuul we would need to slightly adjust the partitioning scheme:

PartitionsRaidMount pointSize Note
sda1 sdb1raid 1 - md0/20 GiBGrow from 10 to 30 GiB
sda2 sdb2raid 1 - md1[SWAP]1 GiB
sda3 sdb3raid 1 - md2/srv/ssd149 GiBShrink to 19 GiB
sda4 sdb4raid 1 - md3/var/lib/jenkins/builds200 GiBNEW, Jenkins artifacts

Leaving 50 GiB for room to grow a partition (if at all possible) and eventually add a second zuul merger instance (the purpose of /srv/ssd right now).

Additional disk space is a blocker for phasing out the Precise host gallium and moving Jenkins/Zuul to scandium.

Event Timeline

hashar created this task.Jul 11 2016, 2:14 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 11 2016, 2:14 PM

Follow up a checkin last week with @thcipriani and @chasemp

scandium as it turns out is a very old server which I'm not sure makes sense to spend time migrating to. I have not gotten a chance to outline a plan with all the historical context regarding contint1001 I have been reading through, but it's my current thought to enlist that server if possible. It is much newer, has horsepower for growth, storage for content, and is more or less already allocated. But this is my position not having read through all of the history at this point. Either way to move from gallium to scandium (both as single points of failure) with scandium being (my understanding) 5 years old seems odd.

contint1001 was setup in the production network when it would need to be in the labs support host network. My understanding is we would well have to physically move it in the datacenter :(

It is also largely overpowered for the task at hand Private doc: gallium replacement targets. Namely 64 GBytes RAM , 32 CPU which is more than four times what we need.

hashar closed this task as Declined.Jul 13 2016, 3:31 PM

Change of .plan. We are heading toward using contint1001 to replace both gallium and scandium.