Page MenuHomePhabricator

eqiad: (1) allocate server to migrate Zuul server to
Closed, ResolvedPublic

Description

During the Weekly checkin for Continuous-Integration-Scaling, @chasemp proposed to migrate the Zuul server from gallium.wikimedia.org to a new / more recent production host. The reasons are:

  • it potentially has unpuppetized configuration, migrating will force us to resorb the potential technical debt
  • it is still hosted on Precise forcing us to build a Debian package for that distribution
  • hardware is getting old

The Zuul server is a threaded application which is fairly lightweight. mem usage from top on gallium:

PID USER      VIRT  RES  SHR S %CPU %MEM   TIME WCHAN     COMMAND
8682 zuul     2041m 161m 3936 S  0.0  2.0  16:04 pause     /usr/bin/python /usr/local/bin/zuul-server -c /etc/zuul/zuul-server.conf
8687 zuul      242m  75m 2024 S  0.0  1.0  69:31 pipe_wait /usr/bin/python /usr/local/bin/zuul-server -c /etc/zuul/zuul-server.conf

It embeds:

  • a web server serving its internal state as a status.json file. Requests from public internet users lands on misc-varnish, then an Apache on gallium which proxy the request to the Zuul webserver over localhost.
  • a gearman server that got connections from Jenkins. With the isolation projects there will be client connection from labs hosts subnet as well

@hashar needs root access on the machine to maintain / debug Zuul in production, might prevent having the service collocated. So a lightweight server will do.

Since it will be a backend of misc-varnish, we can consider moving the integration website to it as well.

Event Timeline

hashar created this task.Apr 10 2015, 8:39 PM
hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added subscribers: hashar, Aklapper, chasemp.
Restricted Application added a project: acl*sre-team. · View Herald TranscriptApr 10 2015, 8:39 PM
Andrew triaged this task as High priority.Apr 11 2015, 9:26 PM
Andrew set Security to None.
RobH added a subscriber: RobH.Apr 13 2015, 8:37 PM

'hardware is getting old' is not a valid reasoning.

So this cannot be easily upgraded in place, and a swap of the hardware for a more seamless transition is. It will likely be hardware that is just as old; as the actual hardware requirements are quite low.

'hardware is getting old' is not a valid reasoning.
So this cannot be easily upgraded in place, and a swap of the hardware for a more seamless transition is. It will likely be hardware that is just as old; as the actual hardware requirements are quite low.

So the gist is that their are components here running very unpuppetized and on precise. Some things require packaging. In order to move towards sanity I suggested we spin up a box in parity (for which we can use Jessie) and at the end of things we know we have built out the new nodepool infrastructure on Puppet-ized and solid ground. The age consideration is mostly this: the unpuppetized box is getting old and when it fails it will suck a lot.

RobH added a comment.Apr 13 2015, 8:44 PM

Gallium is the following:

Single CPU: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz
Dual 500GB SATA plus an SSD.

*-disk:0                
     description: ATA Disk
     product: SAMSUNG HE502HJ
     physical id: 0
     bus info: scsi@0:0.0.0
     logical name: /dev/sda
     version: 1AJ3
     serial: S2B6J90ZC10733
     size: 465GiB (500GB)
     capabilities: partitioned partitioned:dos
     configuration: ansiversion=5 signature=000262c2
*-disk:1
     description: ATA Disk
     product: INTEL SSDSA2M160
     physical id: 0.1.0
     bus info: scsi@0:0.1.0
     logical name: /dev/sdb
     version: 2CV1
     serial: CVPO102204ND160AGN
     size: 149GiB (160GB)
     capabilities: partitioned partitioned:dos
     configuration: ansiversion=5 signature=4f65a25a
*-disk:2
     description: ATA Disk
     product: SAMSUNG HE502HJ
     physical id: 1
     bus info: scsi@1:0.0.0
     logical name: /dev/sdc
     version: 1AJ3
     serial: S2B6J90ZC12882
     size: 465GiB (500GB)
     capabilities: partitioned partitioned:dos
     configuration: ansiversion=5 signature=0005d32f

robh@gallium:~$ cat /proc/meminfo
MemTotal: 8165776 kB

RobH assigned this task to Cmjohnson.Apr 13 2015, 8:58 PM
RobH added a subscriber: Cmjohnson.

I'm thinking about allocating system cobalt for this, but I need to assign this task to Chris to check a few things.

@Cmjohnson: Please advise if cobalt can accomodate (and if you have) a single intel SSD. One of the older models (not the S3500) is preferred, as that is what gallium is using.

Please advise, and assign task back to me.

RobH lowered the priority of this task from High to Normal.Apr 13 2015, 8:58 PM
RobH added a subscriber: Andrew.

@Andrew: You set this to high priority, but it seems to be generally not any higher than a normal request. As such, I've set it back to normal. Please advise if this is not correct.

RobH claimed this task.Apr 13 2015, 9:00 PM

Gallium has some SSD disk but the process that makes use of it are moving to some other machines. Hence the new server (cobalt?) doesn't need any SSD, it will for now just host the Zuul server which isn't doing much I/O beside log writing.

RobH closed this task as Resolved.Apr 13 2015, 9:34 PM

Cobalt is allocated for this task. System setup will proceed on T95959. Resolving this hardware-requests.