rack/setup prometheus100[3-4]
Closed, ResolvedPublic

Description

This task will track the racking and initial setup of prometheus100[3-4].
Please verify and see if the racking location and the hostnames are okay and provide partman recipe to use. Thanks

prometheus1003

  • - receive in normally on parent task T149339
  • - rack location A3
  • - create dns entries for internal production IP address, and mgmt entries for both asset tag and hostname
  • - setup bios and drac
  • - update task with port info
  • - install_server module update
  • - install OS
  • - sign/accept puppet & salt keys
  • - hand off to @Filippo for service implementation.

prometheus1004

  • - receive in normally on parent task T149339
  • - rack location B4
  • - create dns entries for internal production IP address, and mgmt entries for both asset tag and hostname
  • - setup bios and drac
  • - update task with port info
  • - install_server module update
  • - install OS
  • - sign/accept puppet & salt keys
  • - hand off to @Filippo for service implementation.
Cmjohnson created this task.Dec 6 2016, 3:53 PM
Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald TranscriptDec 6 2016, 3:53 PM

Change 325569 had a related patch set uploaded (by Cmjohnson):
Adding dns entriesf or prometheus1003 and 1004 both production and mgmt. T152504

https://gerrit.wikimedia.org/r/325569

Change 325569 merged by Cmjohnson:
Adding dns entriesf or prometheus1003 and 1004 both production and mgmt. T152504

https://gerrit.wikimedia.org/r/325569

Change 325628 had a related patch set uploaded (by Cmjohnson):
Adding mac addresses for prometheus1003 and 1004 T152504

https://gerrit.wikimedia.org/r/325628

Change 325628 merged by Cmjohnson:
Adding mac addresses for prometheus1003 and 1004 T152504

https://gerrit.wikimedia.org/r/325628

Thanks @Cmjohnson ! re: raid/partman setup it'll be the same as prometheus200[34] in T151338: rack/setup prometheus200[3-4]. Namely hw raid, first VD on raid1 for the ssd and then raid10 for hdd

H/W Raid is set up
ssds raid 1
spinning disks raid 10

Cmjohnson updated the task description. (Show Details)Dec 9 2016, 4:46 PM

Change 327677 had a related patch set uploaded (by Cmjohnson):
Adding prometheus1003-4 to netboot.cfg file T152504

https://gerrit.wikimedia.org/r/327677

Change 327677 merged by Cmjohnson:
Adding prometheus1003-4 to netboot.cfg file T152504

https://gerrit.wikimedia.org/r/327677

Cmjohnson updated the task description. (Show Details)Jan 6 2017, 3:07 PM

I used these 2 boxes to test install from install1001 (instead of carbon). The installer started fine on 1003, then the install just fails at grub install for unknown and unrelated reasons. 1004 talks to the DHCP server but gets "no free leases" as when the switch port is in the wrong network. Though @RobH already fixed that both of these were in the public network before. (Before that 1003 also got "no free leases").

Cmjohnson reassigned this task from Cmjohnson to Dzahn.Feb 6 2017, 6:58 PM

Assigning this to you.

Dzahn removed Dzahn as the assignee of this task.Feb 6 2017, 10:25 PM
Dzahn added a comment.Feb 6 2017, 10:52 PM

I was just using these as random test hosts for the installer.

trying on 1003 though if things changed.

Dzahn claimed this task.Feb 6 2017, 11:04 PM

Mentioned in SAL (#wikimedia-operations) [2017-02-06T23:45:53Z] <mutante> prometheus1003 - installed OS, signing puppet cert, initial run (T152504)

Dzahn added a comment.Feb 6 2017, 11:56 PM

prometheus1003 has an OS now and has been added to puppet and can be used

prometheus1004 looks like it's still in the wrong VLAN, the public one, but it should be in private.

Dzahn removed Dzahn as the assignee of this task.Feb 7 2017, 12:27 AM

Mentioned in SAL (#wikimedia-operations) [2017-02-07T01:31:11Z] <mutante> prometheus1004 - installed OS, signing puppet cert, initial run.. (T152504)

Dzahn added a comment.Feb 7 2017, 1:46 AM

Rob fixed the switch port config, then i could install prometheus1004 as well. It's done and sitting at login like 1003, but without a specific role so far.

Dzahn updated the task description. (Show Details)Feb 7 2017, 1:47 AM
Dzahn updated the task description. (Show Details)
Dzahn assigned this task to fgiunchedi.

Change 336354 had a related patch set uploaded (by Dzahn):
add prometheus1003/1004 to site.pp

https://gerrit.wikimedia.org/r/336354

Dzahn updated the task description. (Show Details)Feb 7 2017, 2:40 AM

Change 336354 merged by Filippo Giunchedi:
add prometheus1003/1004 to site.pp

https://gerrit.wikimedia.org/r/336354

Change 337381 had a related patch set uploaded (by Filippo Giunchedi):
wmnet: add ipv6 for prometheus100[34]

https://gerrit.wikimedia.org/r/337381

Change 337381 merged by Filippo Giunchedi:
wmnet: add ipv6 for prometheus100[34]

https://gerrit.wikimedia.org/r/337381

Change 337384 had a related patch set uploaded (by Filippo Giunchedi):
hieradata: temporarily remove prometheus100[34] from prometheus_hosts

https://gerrit.wikimedia.org/r/337384

Change 337384 merged by Filippo Giunchedi:
hieradata: temporarily remove prometheus100[34] from prometheus_hosts

https://gerrit.wikimedia.org/r/337384

Change 338131 had a related patch set uploaded (by Filippo Giunchedi):
Revert "hieradata: temporarily remove prometheus100[34] from prometheus_hosts"

https://gerrit.wikimedia.org/r/338131

Change 338131 merged by Filippo Giunchedi:
Revert "hieradata: temporarily remove prometheus100[34] from prometheus_hosts"

https://gerrit.wikimedia.org/r/338131

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Mar 16 2017, 3:59 PM

Script wmf_auto_reimage was launched by filippo on neodymium.eqiad.wmnet for hosts:

['prometheus1003.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703201240_filippo_12631.log.

Script wmf_auto_reimage was launched by filippo on neodymium.eqiad.wmnet for hosts:

['prometheus1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201703201245_filippo_14067.log.

Completed auto-reimage of hosts:

['prometheus1003.eqiad.wmnet']

Of which those FAILED:

set(['prometheus1003.eqiad.wmnet'])

Completed auto-reimage of hosts:

['prometheus1004.eqiad.wmnet']

Of which those FAILED:

set(['prometheus1004.eqiad.wmnet'])