Page MenuHomePhabricator

rack/setup/install centrallog1001.eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking and setup of the new central syslogging system for eqiad. This replaces the outdated lithium.eqiad.wmnet, which will be decommisioned once this is fully online.

Hostname Proposal: Change from using elements (which are generic when a system has varied services/roles) to a cluster name, since this host is replicated in codfw. Currently proposed is centrallog1001, but could also do something else. Other options were log1001, and syslog1001.

Racking Proposal: Any 1G capable rack. This host will be in the private vlan for whatever row its assigned.

centrallog1001:

  • - receive in system on procurement task T195416
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch)
  • - puppet accept/initial run
  • - handoff for service implementation

Event Timeline

RobH triaged this task as Medium priority.Jul 30 2018, 4:56 PM
RobH created this task.
RobH renamed this task from rack/setup/install syslog1001.eqiad.wmnet to rack/setup/install centrallog1001.eqiad.wmnet.Jul 30 2018, 5:04 PM
RobH updated the task description. (Show Details)

Change 450094 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt/production dns for centrallog1001

https://gerrit.wikimedia.org/r/450094

Change 450094 merged by Cmjohnson:
[operations/dns@master] Adding mgmt/production dns for centrallog1001

https://gerrit.wikimedia.org/r/450094

Cmjohnson updated the task description. (Show Details)
Cmjohnson moved this task from Racking Tasks to Blocked on the ops-eqiad board.

assigning to @RobH for help with the final installation

RobH removed a project: ops-eqiad.
RobH updated the task description. (Show Details)

@fgiunchedi: You were the SRE team member to provide feedback regarding the disk capacity, so I'm assuming you would be the service owner. If this isn't correct, please comment/assign back to me/assign to service owner as needed.

This system is ready to have services transferred to it, and replace lithium (which is 5+ years old.)

Steps for service implementation:

  • Include centrallog1001 in router ACLs
  • Add centrallog1001 to remote_syslog(_tls) destinations so logs start flowing to that host too
  • Let logs accumulate for more than one day so rotation kicks in, rsync the rest of the logs from lithium

Decommissioning lithium from service:

  • Remove lithium from remote_syslog(_tls) destinations and router ACLs

Not really "logstash" but using Wikimedia-Logstash for logging-related tasks

fgiunchedi moved this task from Up next to Radar on the User-fgiunchedi board.

Change 519420 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] install_server: use buster for centrallog1001

https://gerrit.wikimedia.org/r/519420

Change 519420 merged by Filippo Giunchedi:
[operations/puppet@production] install_server: use buster for centrallog1001

https://gerrit.wikimedia.org/r/519420

Change 520713 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] eqsin: send logs to centrallog1001 too

https://gerrit.wikimedia.org/r/520713

Change 520713 merged by Filippo Giunchedi:
[operations/puppet@production] eqsin: send logs to centrallog1001 too

https://gerrit.wikimedia.org/r/520713

Change 520761 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Enable centrallog1001 on all pops

https://gerrit.wikimedia.org/r/520761

Change 520761 merged by Filippo Giunchedi:
[operations/puppet@production] Enable centrallog1001 on all pops

https://gerrit.wikimedia.org/r/520761

Change 521245 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: enable centrallog1001 in codfw

https://gerrit.wikimedia.org/r/521245

Change 521245 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: enable centrallog1001 in codfw

https://gerrit.wikimedia.org/r/521245

Change 521428 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: enable centrallog1001 in eqiad

https://gerrit.wikimedia.org/r/521428

Change 521428 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: enable centrallog1001 in eqiad

https://gerrit.wikimedia.org/r/521428

Change 523102 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] syslog: add temp rsync to copy data

https://gerrit.wikimedia.org/r/523102

Change 523102 merged by Filippo Giunchedi:
[operations/puppet@production] syslog: add temp rsync to copy data

https://gerrit.wikimedia.org/r/523102

Change 523669 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Revert "syslog: add temp rsync to copy data"

https://gerrit.wikimedia.org/r/523669

Change 523670 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Remove lithium from service

https://gerrit.wikimedia.org/r/523670

Change 523669 merged by Filippo Giunchedi:
[operations/puppet@production] Revert "syslog: add temp rsync to copy data"

https://gerrit.wikimedia.org/r/523669

Change 523670 merged by Filippo Giunchedi:
[operations/puppet@production] Remove lithium from service

https://gerrit.wikimedia.org/r/523670

Change 523957 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/dns@master] wmnet: flip syslog.eqiad.wmnet to centrallog1001

https://gerrit.wikimedia.org/r/523957

Change 523957 merged by Filippo Giunchedi:
[operations/dns@master] wmnet: flip syslog.eqiad.wmnet to centrallog1001

https://gerrit.wikimedia.org/r/523957

fgiunchedi claimed this task.

This is done, decom for lithium is at T229557: decommission lithium