Majority of Jenkins slaves exceed acceptable clockdrift (more than 60 seconds)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	May 15 2015, 11:06 PM

Description

Aside from warnings internally with Jenkins, this also has user facing impacts in that jobs fail:

https://integration.wikimedia.org/ci/job/phpunit/707/console

22:47:28 [xUnit] [ERROR] - Clock on this slave is out of sync with the master, and therefore
22:47:28 I can't figure out what test results are new and what are old.
22:47:28 Please keep the slave clock in sync with the master.
22:47:28 [xUnit] [INFO] - Failing BUILD.

I don't know what caused this, it started a few hours ago it seems.

Screen Shot 2015-05-16 at 00.04.31.png (1×1 px, 449 KB)

Possibly related:

shinken-wm: PROBLEM - Host deployment-logstash1 is DOWN: CRITICAL - Host Unreachable (10.68.16.134)
shinken-wm: PROBLEM - Host deployment-memc02 is DOWN: CRITICAL - Host Unreachable (10.68.16.14)
shinken-wm: PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.18.38)
shinken-wm: RECOVERY - Host integration-slave-precise-1014 is UPING OK - Packet loss = 0%, RTA = 0.71 ms
shinken-wm: PROBLEM - Host integration-slave-trusty-1016 is DOWN: CRITICAL - Host Unreachable (10.68.18.34)
shinken-wm: RECOVERY - Host integration-slave-trusty-1016 is UPING OK - Packet loss = 0%, RTA = 0.83 ms

Event Timeline

Krinkle created this task.May 15 2015, 11:06 PM

Krinkle raised the priority of this task from to Unbreak Now!.

Krinkle updated the task description. (Show Details)

Krinkle added projects: Continuous-Integration-Infrastructure, Regression, Release-Engineering-Team.

Krinkle subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 15 2015, 11:06 PM

This is probably due to the security patch that I'm applying now (via a suspend/resume). You can reset your clock like this:

$ sudo service ntp stop ; sudo ntpd -q ; sudo service ntp start

I will salt that command everywhere once the update is finished.

Thanks.

greg moved this task from INBOX to Done on the Release-Engineering-Team board.May 23 2015, 2:13 PM

	F165245: Screen Shot 2015-05-16 at 00.04.31.png
	May 15 2015, 11:06 PM

Majority of Jenkins slaves exceed acceptable clockdrift (more than 60 seconds)Closed, ResolvedPublicActions

Description

Event Timeline

Majority of Jenkins slaves exceed acceptable clockdrift (more than 60 seconds)
Closed, ResolvedPublic
Actions