Page MenuHomePhabricator

Service implementation for elastic10[68-83].eqiad.wmnet
Open, MediumPublic8 Estimated Story Points

Description

See T279158 for dc-ops procurement, T281989 for dc-ops racking. This ticket is to track the search team's part: taking the fresh nodes and bringing them properly into service

Step 1: Set up hieradata

  • allocate between psi/omega, keeping rows as balanced as possible

Step 2: Enable cirrus roles

  • after completion of this step, the new hosts should have joined the cirrus elasticsearch clusters

Step 3: Prepare to decom old hosts

Step 4: Actually decom hosts

  • remove cirrus role and decom

Event Timeline

RKemper triaged this task as Medium priority.Nov 2 2021, 6:18 AM
RKemper updated the task description. (Show Details)
[Hosts that will be decom'd]
elastic1032     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     A3  2620:0:861:101:10:64:0:233/64
elastic1033     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     A3  2620:0:861:101:10:64:0:234/64
elastic1034     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     A3  2620:0:861:101:10:64:0:235/64
elastic1035     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     A3  2620:0:861:101:10:64:0:236/64
elastic1036     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     B3  2620:0:861:102:10:64:16:45/64
elastic1037     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     B3  2620:0:861:102:10:64:16:46/64
elastic1038     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     B3  2620:0:861:102:10:64:16:47/64
elastic1039     Failed  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     B3  2620:0:861:102:10:64:16:48/64
elastic1040     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     C5  2620:0:861:103:10:64:32:108/64
elastic1041     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     C5  2620:0:861:103:10:64:32:109/64
elastic1042     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     C5  2620:0:861:103:10:64:32:110/64
elastic1043     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     C5  2620:0:861:103:10:64:32:111/64
elastic1044     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     A6  2620:0:861:101:10:64:0:85/64
elastic1045     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     A6  2620:0:861:101:10:64:0:86/64
elastic1046     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     B6  2620:0:861:102:10:64:16:70/64
elastic1047     Active  —   Server  HP ProLiant DL360 Gen9  Equinix Ashburn     B6  2620:0:861:102:10:64:16:71/64
[New hosts w/ psi vs omega assignment]
elastic1068     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A4  2620:0:861:101:10:64:0:72/64   omega
elastic1069     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A4  2620:0:861:101:10:64:0:73/64   psi
elastic1070     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:74/64   omega
elastic1071     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:76/64   omega
elastic1072     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:77/64   psi
elastic1073     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:78/64   psi
elastic1074     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B2  2620:0:861:102:10:64:16:42/64  omega
elastic1075     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B2  2620:0:861:102:10:64:16:49/64  psi
elastic1076     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:50/64  omega
elastic1077     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:51/64  omega
elastic1078     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:52/64  psi
elastic1079     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:53/64  psi
elastic1080     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C4  2620:0:861:103:10:64:32:29/64  omega
elastic1081     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C4  2620:0:861:103:10:64:32:166/64 psi
elastic1082     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C7  2620:0:861:103:10:64:32:167/64 omega
elastic1083     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C7  2620:0:861:103:10:64:32:168/64 psi
[New hosts w/ psi vs omega assignment, separated into rows for visual convenience]
elastic1068     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A4  2620:0:861:101:10:64:0:72/64   omega
elastic1070     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:74/64   omega
elastic1071     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:76/64   omega

elastic1074     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B2  2620:0:861:102:10:64:16:42/64  omega
elastic1076     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:50/64  omega
elastic1077     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:51/64  omega

elastic1080     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C4  2620:0:861:103:10:64:32:29/64  omega
elastic1082     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C7  2620:0:861:103:10:64:32:167/64 omega


elastic1069     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A4  2620:0:861:101:10:64:0:73/64   psi
elastic1072     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:77/64   psi
elastic1073     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     A7  2620:0:861:101:10:64:0:78/64   psi

elastic1075     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B2  2620:0:861:102:10:64:16:49/64  psi
elastic1078     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:52/64  psi
elastic1079     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     B4  2620:0:861:102:10:64:16:53/64  psi

elastic1081     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C4  2620:0:861:103:10:64:32:166/64 psi
elastic1083     Staged  —   Server  Dell PowerEdge R440     Equinix Ashburn     C7  2620:0:861:103:10:64:32:168/64 psi
[new conftool-data entries corresponding to the above]
elastic1068.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1069.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1070.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1071.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1072.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1073.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1074.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1075.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1076.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1077.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1078.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1079.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1080.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1081.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]
elastic1082.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-omega-ssl]
elastic1083.eqiad.wmnet: [elasticsearch, elasticsearch-ssl, elasticsearch-psi-ssl]

Step 3

(Old master configuration)
(main cluster)
    unicast_hosts: # this is also the list of master eligible nodes
      - elastic1036.eqiad.wmnet (B3)
      - elastic1040.eqiad.wmnet (C5)
      - elastic1054.eqiad.wmnet

(omega)
    unicast_hosts: # this is also the list of master eligible nodes
      - elastic1034.eqiad.wmnet (A3)
      - elastic1038.eqiad.wmnet (B3)
      - elastic1040.eqiad.wmnet (C5)

(psi)
    unicast_hosts: # this is also the list of master eligible nodes
      - elastic1048.eqiad.wmnet
      - elastic1050.eqiad.wmnet
      - elastic1052.eqiad.wmnet

->

(New master configuration)
(main cluster)
    unicast_hosts: # this is also the list of master eligible nodes
      - elastic1074.eqiad.wmnet (B2)
      - elastic1081.eqiad.wmnet (C4)
      - elastic1054.eqiad.wmnet

(omega)
    unicast_hosts: # this is also the list of master eligible nodes
      - elastic1068.eqiad.wmnet (A4)
      - elastic1076.eqiad.wmnet (B2)
      - elastic1080.eqiad.wmnet (C4)

(psi)
    unicast_hosts: # this is also the list of master eligible nodes
      - elastic1048.eqiad.wmnet
      - elastic1050.eqiad.wmnet
      - elastic1052.eqiad.wmnet

Change 736116 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] elasticsearch: hiera for new eqiad nodes (step 1)

https://gerrit.wikimedia.org/r/736116

Change 736117 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] elasticsearch: activate role (step 2)

https://gerrit.wikimedia.org/r/736117

Change 736118 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] elasticsearch: new master config (step 3)

https://gerrit.wikimedia.org/r/736118

Change 736119 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] elasticsearch: decom elastic10[32-47] (step 4)

https://gerrit.wikimedia.org/r/736119

Mentioned in SAL (#wikimedia-operations) [2022-01-12T19:14:40Z] <mutante> elastic10180 - one power supply seeming failed - see icinga IPMI alert - [Status = Critical, PS Redundancy = Critical] T294805