Page MenuHomePhabricator

rack/setup/install netmon2001
Closed, ResolvedPublic

Description

This task will track the receiving, racking, and installation of netmon2001, ordered on T161807.

Racking Plan: This is the only network monitoring system in codfw. It uses 1Gbps networking, so place in a 1Gbit networking rack with the most space available and power overhead. It can basically go in any 1Gbps rack other than frack.

  • - receive in system on procurement task T161807.
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - production dns entries added (external subnet)
  • - network port setup (description, enable, external vlan)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

also see: T159756 (setup netmon1002 the eqiad equivalent)

Details

Related Gerrit Patches:
operations/puppet : productionrancid: switch active server to netmon2001
operations/puppet : productionsmokeping: switch backend from netmon1002 to netmon2001
operations/puppet : productionsmokeping: sync data to netmon2001, use quickdatacopy, in role
operations/puppet : productioncache::misc: add director for netmon2001
operations/dns : masteradd IPv6 records for netmon2001
operations/puppet : productionrancid/netmon: add active_server parameter to DC-switch
operations/puppet : productionrancid: disable fully automatic rsyncing of app data
operations/puppet : productionrancid: add rsync::quickdatacopy to sync /var/lib/rancid
operations/puppet : productionnetmon: disable Letsencrypt on netmon2001
operations/puppet : productionadd netmon2001 to site, equal to netmon1002
operations/puppet : productionfixing ordering of servers
operations/puppet : productionsetting netmon2001 install params
operations/dns : mastercorrecting netmon2001 dns entries
operations/puppet : productionsettting netmon2001 install params
operations/puppet : productionDHCP: Add MAC address for netmon2001
operations/dns : masterDNS: Add mgmt and production DNS for netmon2001

Event Timeline

RobH created this task.May 23 2017, 9:04 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 23 2017, 9:04 PM

Change 361606 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] smokeping: allow rsync of data from netmon1001 to netmon1002

https://gerrit.wikimedia.org/r/361606

Dzahn added a subscriber: Dzahn.Jun 27 2017, 12:26 AM

oops, wrong ticket, disregard last comment

faidon moved this task from Backlog to In progress on the observability board.Jul 10 2017, 12:36 PM
Papaul updated the task description. (Show Details)Jul 10 2017, 3:30 PM

Change 364260 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for netmon2001

https://gerrit.wikimedia.org/r/364260

Papaul updated the task description. (Show Details)Jul 10 2017, 5:52 PM

Change 364260 merged by Dzahn:
[operations/dns@master] DNS: Add mgmt and production DNS for netmon2001

https://gerrit.wikimedia.org/r/364260

Change 364353 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for netmon2001

https://gerrit.wikimedia.org/r/364353

Change 364353 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address for netmon2001

https://gerrit.wikimedia.org/r/364353

@RobH can you please setup network port for netmon2001? Thanks

asw-d-codfw:ge-5/0/23

RobH closed this task as Resolved.Jul 11 2017, 4:09 PM
robh@asw-d-codfw# show | compare  
[edit interfaces interface-range vlan-public1-d-codfw]
     member ge-1/0/13 { ... }
+    member ge-5/0/23;
[edit interfaces]
+   ge-5/0/23 {
+       description netmon2001;
+       enable;
+   }

Done and merged live on switch stack.

RobH updated the task description. (Show Details)Jul 11 2017, 4:09 PM
Dzahn reopened this task as Open.Jul 11 2017, 5:07 PM

Thanks Papaul, Rob, i'm gonna reopen this and take it to continue with OS install and adding services.

Dzahn claimed this task.Jul 11 2017, 5:07 PM
Dzahn added a subscriber: Papaul.
Dzahn reassigned this task from Dzahn to Papaul.Jul 11 2017, 5:10 PM
RobH added a comment.Jul 11 2017, 5:11 PM

I did not mean to resolve the task, my bad!

Dzahn added a comment.Jul 11 2017, 5:12 PM

Do you know which partman recipe is the right one?

RobH claimed this task.Jul 11 2017, 5:17 PM

I'm going to claim this for install, so @Papaul can work on other onsite tasks =]

Change 364478 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] settting netmon2001 install params

https://gerrit.wikimedia.org/r/364478

Change 364478 merged by RobH:
[operations/puppet@production] settting netmon2001 install params

https://gerrit.wikimedia.org/r/364478

Change 364481 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] correcting netmon2001 dns entries

https://gerrit.wikimedia.org/r/364481

Change 364481 merged by RobH:
[operations/dns@master] correcting netmon2001 dns entries

https://gerrit.wikimedia.org/r/364481

Change 364482 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting netmon2001 install params

https://gerrit.wikimedia.org/r/364482

Change 364482 merged by RobH:
[operations/puppet@production] setting netmon2001 install params

https://gerrit.wikimedia.org/r/364482

Change 364489 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] fixing ordering of servers

https://gerrit.wikimedia.org/r/364489

Change 364489 merged by RobH:
[operations/puppet@production] fixing ordering of servers

https://gerrit.wikimedia.org/r/364489

RobH reassigned this task from RobH to Dzahn.Jul 11 2017, 7:37 PM
RobH updated the task description. (Show Details)
RobH removed projects: Patch-For-Review, ops-codfw.

Assigned to @Dzahn for service implemetnation. I've assumed it goes to him, since he is handling the stretch service updates for netmon1002.

Change 364585 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add netmon2001 to site, equal to netmon1002

https://gerrit.wikimedia.org/r/364585

Change 364585 merged by Dzahn:
[operations/puppet@production] add netmon2001 to site, equal to netmon1002

https://gerrit.wikimedia.org/r/364585

Change 364613 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] netmon: disable Letsencrypt on netmon2001

https://gerrit.wikimedia.org/r/364613

Change 364613 merged by Dzahn:
[operations/puppet@production] netmon: disable Letsencrypt on netmon2001

https://gerrit.wikimedia.org/r/364613

Change 364620 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rancid: add rsync::quickdatacopy to sync /var/lib/rancid

https://gerrit.wikimedia.org/r/364620

Change 364620 merged by Dzahn:
[operations/puppet@production] rancid: add rsync::quickdatacopy to sync /var/lib/rancid

https://gerrit.wikimedia.org/r/364620

Change 364624 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rancid: disable fully automatic rsyncing of app data

https://gerrit.wikimedia.org/r/364624

Change 364624 merged by Dzahn:
[operations/puppet@production] rancid: disable fully automatic rsyncing of app data

https://gerrit.wikimedia.org/r/364624

Change 364629 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rancid/netmon: add active_server parameter to DC-switch

https://gerrit.wikimedia.org/r/364629

Change 364629 merged by Dzahn:
[operations/puppet@production] rancid/netmon: add active_server parameter to DC-switch

https://gerrit.wikimedia.org/r/364629

Change 364641 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for netmon2001

https://gerrit.wikimedia.org/r/364641

Change 364641 merged by Dzahn:
[operations/dns@master] add IPv6 records for netmon2001

https://gerrit.wikimedia.org/r/364641

Mentioned in SAL (#wikimedia-operations) [2017-07-14T04:21:35Z] <mutante> netmon1002/netmon2001 - change UID/GID for rancid to universal 445/445, use find -exec to chown existing files, for unmessy data syncing, define UID on wikitech page UID (T166180)

Change 365890 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] cache::misc: add director for netmon2001

https://gerrit.wikimedia.org/r/365890

Change 365890 merged by Dzahn:
[operations/puppet@production] cache::misc: add director for netmon2001

https://gerrit.wikimedia.org/r/365890

Change 365892 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] smokeping: switch backend from netmon1002 to netmon2001

https://gerrit.wikimedia.org/r/365892

Change 365893 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] smokeping: sync data to netmon2001, use quickdatacopy, in role

https://gerrit.wikimedia.org/r/365893

Change 365893 merged by Dzahn:
[operations/puppet@production] smokeping: sync data to netmon2001, use quickdatacopy, in role

https://gerrit.wikimedia.org/r/365893

Change 365892 merged by Dzahn:
[operations/puppet@production] smokeping: switch backend from netmon1002 to netmon2001

https://gerrit.wikimedia.org/r/365892

Change 366012 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rancid: switch active server to netmon2001

https://gerrit.wikimedia.org/r/366012

Change 366012 merged by Dzahn:
[operations/puppet@production] rancid: switch active server to netmon2001

https://gerrit.wikimedia.org/r/366012

Dzahn added a comment.Jul 19 2017, 1:50 AM

netmon2001 is up and running.

netmon1002 and netmon2001 use identical roles:

1791 node /^netmon(1002|2001)\.wikimedia\.org$/ {
1792     role(network::monitor, librenms, rancid, smokeping)

smokeping is currently on netmon2001 (behind cache::misc, switch cache backend in Hiera)

rancid is on currently netmon2001 (switch controlled by "netmon_server" setting in common.yaml in Hiera)

librenms is currently on netmon1002 (switch in DNS)

Dzahn closed this task as Resolved.Jul 21 2017, 10:12 PM
Dzahn updated the task description. (Show Details)