Page MenuHomePhabricator

rack/setup/install elastic2025-2036
Closed, ResolvedPublic

Description

12 new elastic systems were leased per parent task T150378. These 12 systems are LEASED (please note such in racktables) and will be used in addition to the 24 existing elastic search systems.

  • - receive in and attach packing slip to parent task T150378
  • - determine where to rack systems based off existing racking plan (see note below)
  • - rack systems, update racktables
  • - create mgmt dns entries (both asset tag and hostname)
  • - create production dns entries (internal vlan)
  • - update/create sub task with network port info for all new hosts
  • - install_server module update (mac address and partitioning info, partition like existing elastic systems)
  • - install os
  • - puppet/salt accept
  • - hand off to @Gehel for service implementation.

The existing elastic systems are racked as follows:

elastic2001 elastic codfw row A A5
elastic2002 elastic codfw row A A5
elastic2003 elastic codfw row A A5
elastic2004 elastic codfw row A A8
elastic2005 elastic codfw row A A8
elastic2006 elastic codfw row A A8
elastic2007 elastic codfw row B B5
elastic2008 elastic codfw row B B5
elastic2009 elastic codfw row B B5
elastic2010 elastic codfw row B B8
elastic2011 elastic codfw row B B8
elastic2012 elastic codfw row B B8
elastic2013 elastic codfw row C C1
elastic2014 elastic codfw row C C1
elastic2015 elastic codfw row C C1
elastic2016 elastic codfw row C C5
elastic2017 elastic codfw row C C5
elastic2018 elastic codfw row C C5
elastic2019 elastic codfw row D D1
elastic2020 elastic codfw row D D1
elastic2021 elastic codfw row D D1
elastic2022 elastic codfw row D D5
elastic2023 elastic codfw row D D5
elastic2024 elastic codfw row D D5

My (@RobH) suggestion would be to evenly distribute the new 12 systems across those 8 racks (3 per row, 2 in one rack, 1 in the other). This does NOT take into account rack space, power overhead, or network port availability. @Papaul should review the above criteria for racking, and either confirm the mentioned plan, or propose an alternative for review.

Power overhead should be reviewed, as we should never reach over 8640 WATTS per tower under full load. Since we use redundant feeds, each single tower, under shared load, should never exceed half of that, or 4320 WATTS.

Thanks!

Event Timeline

RobH edited projects, added ops-codfw; removed procurement.

@Gehel please see below the racking schema for the new elastic servers. Let me know if you approve, so I can start the racking next week. Thanks

elastic2025 row A rack A5
elastic2026 row A rack A8
elastic2027 row A rack A8

elastic2028 row B rack B5
elastic2029 row B rack B8
elastic2030 row B rack B8

elastic2031 row C rack C1
elastic2032 row C rack C5
elastic2033 row C rack C5

elastic2034 row D rack D1
elastic2035 row D rack D5
elastic2036 row D rack D5

Assigned to @Gehel for his feedback.

@Gehel: Please review and if all looks good, comment and assign back to @Papaul for implementation.

Thanks!

That looks fine! Thanks!

Papaul updated the task description. (Show Details)

@Gehel you can take over.

Change 333592 had a related patch set uploaded (by Gehel):
elasticsearch - configure new servers in codfw

https://gerrit.wikimedia.org/r/333592

Mentioned in SAL (#wikimedia-operations) [2017-01-23T14:23:02Z] <gehel> disabling puppet on elastic20(2[5-9]|3[0-6]) prior to reimage - T154251

Change 333592 merged by Gehel:
elasticsearch - configure new servers in codfw

https://gerrit.wikimedia.org/r/333592

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2025.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701231500_oblivian_27101.log.

Completed auto-reimage of hosts:

['elastic2025.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2026.codfw.wmnet', 'elastic2027.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701231527_oblivian_2297.log.

Completed auto-reimage of hosts:

['elastic2026.codfw.wmnet', 'elastic2027.codfw.wmnet']

Of which those FAILED:

set(['elastic2027.codfw.wmnet'])

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2027.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701231601_oblivian_11954.log.

Completed auto-reimage of hosts:

['elastic2027.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2028.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701231622_oblivian_17420.log.

Completed auto-reimage of hosts:

['elastic2028.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2029.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701240928_oblivian_8736.log.

Completed auto-reimage of hosts:

['elastic2029.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2030.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241004_oblivian_18480.log.

Completed auto-reimage of hosts:

['elastic2030.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2031.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241038_oblivian_27706.log.

Completed auto-reimage of hosts:

['elastic2031.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2032.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241325_oblivian_8558.log.

Completed auto-reimage of hosts:

['elastic2032.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2033.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241351_oblivian_16326.log.

Completed auto-reimage of hosts:

['elastic2033.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2034.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241415_oblivian_24931.log.

Completed auto-reimage of hosts:

['elastic2034.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2035.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241443_oblivian_416.log.

Completed auto-reimage of hosts:

['elastic2035.codfw.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts:

['elastic2036.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201701241650_oblivian_3412.log.

Completed auto-reimage of hosts:

['elastic2036.codfw.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2017-01-25T13:11:13Z] <gehel> pooling new elasticsearch nodes on codfw - T154251

All new elasticsearch nodes on codfw installed, configured and pooled.