Page MenuHomePhabricator

install/deploy labnodepool1001
Closed, ResolvedPublic

Description

System Deployment Steps:

Event Timeline

RobH claimed this task.
RobH raised the priority of this task from to Medium.
RobH updated the task description. (Show Details)
RobH added a project: acl*sre-team.
RobH added subscribers: RobH, hashar, dduvall.
RobH set Security to None.
RobH removed subscribers: Andrew, chasemp.

Change 202446 had a related patch set uploaded (by RobH):
setting the mgmt and production dns entries for labnodepool1001

https://gerrit.wikimedia.org/r/202446

Change 202446 merged by RobH:
setting the mgmt and production dns entries for labnodepool1001

https://gerrit.wikimedia.org/r/202446

Change 202450 had a related patch set uploaded (by RobH):
setting labnodepool1001 install params

https://gerrit.wikimedia.org/r/202450

Change 202450 merged by RobH:
setting labnodepool1001 install params

https://gerrit.wikimedia.org/r/202450

network switch port setup has an issue, described in sub-task T95048, once resolved installation can continue.

OS is installed, but attempting to sign keys afterwards has lead to an issue. I cannot ssh or ping labnodepool1001.eqiad.wmnet from palladium (puppetmaster). I can do so from carbon.

This seems a bit odd, still investigating.

bastion, carbon, gallium... hosts in public IP vlans can ping the host, but nothing in the private vlans...

chatted with andrew, this is a known thing, and iron can ssh in. resuming installation

RobH removed RobH as the assignee of this task.Apr 10 2015, 9:15 PM
RobH updated the task description. (Show Details)

puppet/salt accepted, system ready for service implementation.

hashar changed the task status from Open to Stalled.Apr 10 2015, 9:18 PM

Thank you very much @RobH ! Service implementation is pending gaining access to it via T95303 that will be discussed Monday during the Ops meeting.

I have created a basic Debian package for Nodepool (T89142) and installed it on labnodepool1001.eqiad.wmnet.

For testing purposes I have created a basic configuration file under /etc/nodepool and installed a local mysql server.

On Wikitech I have created a user nodepoolmanager which is an admin of the labs project contintcloud. The nodepool configuration file reflects that user credentials.

The hardware has been allocated for Nodepool, so I removed the blocking task T93706: [SchemaGatherClicks] Missing or empty schema

Most of the puppet patches have been merged, the one left over is the systemd configuration T96867: Use systemd for Nodepool

The service is implemented and managed to magically boot and delete an instance. The labs work made by @Andrew in spring has been a huge benefit. There is still lot of work but I the first phase is now complete: nodepool is up and running.

hashar claimed this task.