Page MenuHomePhabricator

Migrate role::netmon to Buster
Closed, ResolvedPublic

Description

As per title, all hosts running this role should be on Buster. Ideally by FQ2 FY20-21.

root@cumin1001:~# cumin 'P{O:netmon} and not P{F:lsbdistcodename = buster}'
2 hosts will be targeted:
netmon[1002,2001].wikimedia.org
DRY-RUN mode enabled, aborting

Current plan:

Issues identified:

  1. Polling time for eqiad devices increased significantly due to the added latency. For the most populated rows (eqiad B and D) this means that occasionally poll times are >5 min, resulting in alerts and potentially missed data
  2. librenms web ui got significantly slower (from Europe at least) in part because of the added latency to reach codfw, in part because the database is still in eqiad

Details

ProjectBranchLines +/-Subject
operations/dnsmaster+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+1 -3
operations/puppetproduction+2 -2
operations/puppetproduction+0 -1
operations/puppetproduction+3 -2
operations/puppetproduction+1 K -1 K
operations/puppetproduction+1 K -1 K
operations/puppetproduction+11 -0
operations/puppetproduction+8 -6
operations/puppetproduction+0 -5
operations/dnsmaster+2 -2
operations/puppetproduction+3 -3
operations/puppetproduction+5 -1
operations/puppetproduction+8 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -0
operations/puppetproduction+2 -2
operations/puppetproduction+10 -6
operations/puppetproduction+1 -1
operations/puppetproduction+0 -1
operations/dnsmaster+2 -2
operations/puppetproduction+4 -0
operations/puppetproduction+2 -8
operations/puppetproduction+23 -8
Show related patches Customize query in gerrit

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptMar 18 2020, 12:40 PM
herron added a subscriber: herron.Mar 23 2020, 3:34 PM
MoritzMuehlenhoff triaged this task as Medium priority.Apr 6 2020, 6:24 AM
fgiunchedi moved this task from Inbox to Backlog on the observability board.Apr 6 2020, 12:33 PM
jbond added a subscriber: jbond.Jul 10 2020, 1:44 PM

Change 611317 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: port netmon to Buster

https://gerrit.wikimedia.org/r/611317

Change 611318 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: install fcgid package on netmon

https://gerrit.wikimedia.org/r/611318

Change 611317 merged by Muehlenhoff:
[operations/puppet@production] role: port netmon to Buster

https://gerrit.wikimedia.org/r/611317

Change 613146 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Install the fcgid package on Netmon

https://gerrit.wikimedia.org/r/613146

Mentioned in SAL (#wikimedia-operations) [2020-07-17T08:48:28Z] <moritzm> imported prometheus-atlas-exporter 1.0+git20191204.ffafab7-2 to buster-wikimedia T247967

Change 613146 merged by Muehlenhoff:
[operations/puppet@production] Install the fcgid package on Netmon

https://gerrit.wikimedia.org/r/613146

Change 611318 abandoned by Filippo Giunchedi:
[operations/puppet@production] role: install fcgid package on netmon

Reason:
As per Moritz,
Superceded by https://gerrit.wikimedia.org/r/c/operations/puppet/ /612865 and https://gerrit.wikimedia.org/r/c/operations/puppet/ /613146

https://gerrit.wikimedia.org/r/611318

fgiunchedi updated the task description. (Show Details)Tue, Jul 21, 8:56 AM
fgiunchedi moved this task from Backlog to Up next on the observability board.Tue, Jul 21, 9:23 AM
fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.

Change 615417 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/dns@master] wikimedia.org: lower librenms TTL in preparation for failover

https://gerrit.wikimedia.org/r/615417

fgiunchedi updated the task description. (Show Details)Wed, Jul 22, 9:26 AM

Change 615417 merged by Filippo Giunchedi:
[operations/dns@master] wikimedia.org: lower librenms/smokeping TTL in preparation for failover

https://gerrit.wikimedia.org/r/615417

fgiunchedi updated the task description. (Show Details)Wed, Jul 22, 9:54 AM
fgiunchedi updated the task description. (Show Details)Wed, Jul 22, 10:02 AM

Change 615473 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] smokeping: don't sync data between hosts

https://gerrit.wikimedia.org/r/615473

Change 615474 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] librenms: add passive server for rsync server

https://gerrit.wikimedia.org/r/615474

Change 615475 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] install_server: reinstall netmon2001 with Buster

https://gerrit.wikimedia.org/r/615475

Change 615475 merged by Filippo Giunchedi:
[operations/puppet@production] install_server: reinstall netmon2001 with Buster

https://gerrit.wikimedia.org/r/615475

Change 615473 merged by Filippo Giunchedi:
[operations/puppet@production] smokeping: don't sync data between hosts

https://gerrit.wikimedia.org/r/615473

Change 615670 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] smokeping: match documentroot with smokeping installation

https://gerrit.wikimedia.org/r/615670

Change 615474 merged by Filippo Giunchedi:
[operations/puppet@production] librenms: add passive server for rsync server

https://gerrit.wikimedia.org/r/615474

Change 615670 merged by Filippo Giunchedi:
[operations/puppet@production] smokeping: match documentroot with smokeping installation

https://gerrit.wikimedia.org/r/615670

Change 615730 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] librenms: create/update users when using SSO

https://gerrit.wikimedia.org/r/615730

Change 615731 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] librenms: set bootstrap/cache permissions

https://gerrit.wikimedia.org/r/615731

Change 615732 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: force mpm_prefork for netmon/librenms

https://gerrit.wikimedia.org/r/615732

fgiunchedi updated the task description. (Show Details)Thu, Jul 23, 1:09 PM

Change 615738 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: flip netmon2001 back to ldap auth

https://gerrit.wikimedia.org/r/615738

Change 615730 abandoned by Filippo Giunchedi:
[operations/puppet@production] librenms: create/update users when using SSO

Reason:
Obsoleted

https://gerrit.wikimedia.org/r/615730

Change 615738 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: flip netmon2001 back to ldap auth

https://gerrit.wikimedia.org/r/615738

Change 615731 merged by Filippo Giunchedi:
[operations/puppet@production] librenms: set bootstrap/cache permissions

https://gerrit.wikimedia.org/r/615731

Change 615732 merged by Filippo Giunchedi:
[operations/puppet@production] role: force mpm_prefork for netmon/librenms

https://gerrit.wikimedia.org/r/615732

fgiunchedi updated the task description. (Show Details)Fri, Jul 24, 7:13 AM
fgiunchedi moved this task from Up next to In progress on the observability board.Mon, Jul 27, 1:55 PM

Change 616709 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/dns@master] wikimedia: failover to netmon2001

https://gerrit.wikimedia.org/r/616709

Change 616710 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Failover to netmon2001

https://gerrit.wikimedia.org/r/616710

Mentioned in SAL (#wikimedia-operations) [2020-07-28T08:06:09Z] <godog> failover librenms/smokeping to netmon2001 - T247967

Change 616710 merged by Filippo Giunchedi:
[operations/puppet@production] Failover to netmon2001

https://gerrit.wikimedia.org/r/616710

Change 616709 merged by Filippo Giunchedi:
[operations/dns@master] wikimedia: failover to netmon2001

https://gerrit.wikimedia.org/r/616709

fgiunchedi updated the task description. (Show Details)Tue, Jul 28, 9:06 AM
fgiunchedi updated the task description. (Show Details)
This comment was removed by fgiunchedi.
fgiunchedi updated the task description. (Show Details)Tue, Jul 28, 9:11 AM

Change 616716 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: upgrade snmp-exporter config

https://gerrit.wikimedia.org/r/616716

fgiunchedi updated the task description. (Show Details)Tue, Jul 28, 10:04 AM

Change 616720 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] rancid: ship .gitconfig

https://gerrit.wikimedia.org/r/616720

Change 616722 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] rancid: use active/failover netmon server for rsync

https://gerrit.wikimedia.org/r/616722

Change 616723 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] cache: remove smokeping, not proxied

https://gerrit.wikimedia.org/r/616723

Change 616723 merged by Filippo Giunchedi:
[operations/puppet@production] cache: remove smokeping, not proxied

https://gerrit.wikimedia.org/r/616723

Change 616722 merged by Filippo Giunchedi:
[operations/puppet@production] rancid: use active/failover netmon server for rsync

https://gerrit.wikimedia.org/r/616722

Change 616720 merged by Filippo Giunchedi:
[operations/puppet@production] rancid: ship .gitconfig

https://gerrit.wikimedia.org/r/616720

Change 616716 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: upgrade snmp-exporter config

https://gerrit.wikimedia.org/r/616716

Change 616857 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: upgrade snmp-exporter config

https://gerrit.wikimedia.org/r/616857

Change 616857 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: upgrade snmp-exporter config

https://gerrit.wikimedia.org/r/616857

Change 617067 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: hide diffs in snmp_exporter::module

https://gerrit.wikimedia.org/r/617067

fgiunchedi updated the task description. (Show Details)Wed, Jul 29, 7:53 AM

Change 617067 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: hide diffs in snmp_exporter::module

https://gerrit.wikimedia.org/r/617067

Change 617069 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] install_server: reinstall netmon1002 with Buster

https://gerrit.wikimedia.org/r/617069

Change 617069 merged by Filippo Giunchedi:
[operations/puppet@production] install_server: reinstall netmon1002 with Buster

https://gerrit.wikimedia.org/r/617069

Change 617073 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: bump scrape timeout for PDUs

https://gerrit.wikimedia.org/r/617073

Change 617073 abandoned by Filippo Giunchedi:
[operations/puppet@production] prometheus: bump scrape timeout for PDUs

Reason:
No need

https://gerrit.wikimedia.org/r/617073

Change 617076 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: use rsync wrap_with_stunnel for netmon

https://gerrit.wikimedia.org/r/617076

Change 617076 merged by Filippo Giunchedi:
[operations/puppet@production] role: use rsync wrap_with_stunnel for netmon

https://gerrit.wikimedia.org/r/617076

fgiunchedi updated the task description. (Show Details)Wed, Jul 29, 8:52 AM
fgiunchedi updated the task description. (Show Details)Wed, Jul 29, 9:37 AM

Change 617393 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/dns@master] Revert "wikimedia: failover to netmon2001"

https://gerrit.wikimedia.org/r/617393

Change 617395 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Revert "Failover to netmon2001"

https://gerrit.wikimedia.org/r/617395

fgiunchedi updated the task description. (Show Details)Thu, Jul 30, 8:22 AM

Change 617395 merged by Filippo Giunchedi:
[operations/puppet@production] Revert "Failover to netmon2001"

https://gerrit.wikimedia.org/r/617395

Mentioned in SAL (#wikimedia-operations) [2020-07-30T08:43:41Z] <godog> flip smokeping/librenms from netmon2001 to netmon1002 - T247967

Change 617393 merged by Filippo Giunchedi:
[operations/dns@master] Revert "wikimedia: failover to netmon2001"

https://gerrit.wikimedia.org/r/617393

fgiunchedi updated the task description. (Show Details)Thu, Jul 30, 8:59 AM
fgiunchedi closed this task as Resolved.Thu, Jul 30, 9:06 AM
fgiunchedi claimed this task.

This is complete! All netmon hosts are running Buster

Dzahn awarded a token.Thu, Jul 30, 4:04 PM