Page MenuHomePhabricator

labmon1002: labmon1001.eqiad.wmnet port 22: Connection timed out
Closed, ResolvedPublic

Description

From: Cron Daemon <root@labmon1002.eqiad.wmnet>
Subject: Cron <_graphite@labmon1002> /usr/bin/rsync --delete --delete-after -aSOrd labmon1001.eqiad.wmnet:/srv/carbon/whisper/ /srv/carbon/whisper/

ssh: connect to host labmon1001.eqiad.wmnet port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]

Received on:

  • 2/4/19, 11:04 AM
  • 2/5/19, 7:04 PM
  • 2/6/19, 3:04 AM

Event Timeline

GTirloni created this task.Feb 6 2019, 10:47 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 6 2019, 10:47 AM
GTirloni triaged this task as Normal priority.Feb 6 2019, 10:47 AM
GTirloni claimed this task.Feb 6 2019, 11:05 AM

On IRC we discovered that role::wmcs::monitoring should include profile::wmcs::monitoring, renamed away from labs.

Change 488332 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] role::wmcs::monitoring - Include profile::wmcs::monitoring

https://gerrit.wikimedia.org/r/488332

Other improvements:

  • The puppet code in that profile is mostly for the sync mechanism. We could evaluate if the sync code could be replaced by rsync::quickdatacopy and delete a lot of code (by reusing other), which is good.
  • Openstack client package installation should probably be done by means of profile::openstack::base::clientpackages for consistency
  • hiera keys in profile::wmcs::monitoring should not be using the main deployment keys. Monitoring is done for all deployments in eqiad AFAIK (or should be done)
  • In fact, I don't think the monitoring servers (labmon1001/labmon1002) can be considered "part" of a deployment.
  • In fact, we could evaluate if it worth moving these servers into VM instances in cloudvps...

Change 488332 merged by GTirloni:
[operations/puppet@production] role::wmcs::monitoring - Include profile::wmcs::monitoring

https://gerrit.wikimedia.org/r/488332

Mentioned in SAL (#wikimedia-cloud) [2019-02-06T11:47:31Z] <gtirloni> downtimed labmon100{1,2} T215399

Change 488354 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] profile::wmcs::monitoring - Use openstack clientpackages

https://gerrit.wikimedia.org/r/488354

Change 488354 merged by GTirloni:
[operations/puppet@production] profile::wmcs::monitoring - Use openstack clientpackages

https://gerrit.wikimedia.org/r/488354

Change 488364 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] profile::wmcs::monitoring - Fix missing package

https://gerrit.wikimedia.org/r/488364

Change 488364 merged by GTirloni:
[operations/puppet@production] profile::wmcs::monitoring - Fix missing package

https://gerrit.wikimedia.org/r/488364

Change 488370 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] profile::wmcs::monitoring - Fix IPv6 lookup

https://gerrit.wikimedia.org/r/488370

Change 488370 merged by GTirloni:
[operations/puppet@production] profile::wmcs::monitoring - Fix IPv6 lookup

https://gerrit.wikimedia.org/r/488370

rsync is able to connect now.

GTirloni closed this task as Resolved.Feb 6 2019, 1:40 PM