Page MenuHomePhabricator

Majority of Jenkins slaves exceed acceptable clockdrift (more than 60 seconds)
Closed, ResolvedPublic

Description

Aside from warnings internally with Jenkins, this also has user facing impacts in that jobs fail:

https://integration.wikimedia.org/ci/job/phpunit/707/console

22:47:28 [xUnit] [ERROR] - Clock on this slave is out of sync with the master, and therefore
22:47:28 I can't figure out what test results are new and what are old.
22:47:28 Please keep the slave clock in sync with the master.
22:47:28 [xUnit] [INFO] - Failing BUILD.

I don't know what caused this, it started a few hours ago it seems.

Screen Shot 2015-05-16 at 00.04.31.png (1×1 px, 449 KB)

Possibly related:

shinken-wm: PROBLEM - Host deployment-logstash1 is DOWN: CRITICAL - Host Unreachable (10.68.16.134)
shinken-wm: PROBLEM - Host deployment-memc02 is DOWN: CRITICAL - Host Unreachable (10.68.16.14)
shinken-wm: PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.18.38)
shinken-wm: RECOVERY - Host integration-slave-precise-1014 is UPING OK - Packet loss = 0%, RTA = 0.71 ms
shinken-wm: PROBLEM - Host integration-slave-trusty-1016 is DOWN: CRITICAL - Host Unreachable (10.68.18.34)
shinken-wm: RECOVERY - Host integration-slave-trusty-1016 is UPING OK - Packet loss = 0%, RTA = 0.83 ms

Event Timeline

Krinkle raised the priority of this task from to Unbreak Now!.
Krinkle updated the task description. (Show Details)
Krinkle subscribed.

This is probably due to the security patch that I'm applying now (via a suspend/resume). You can reset your clock like this:

$ sudo service ntp stop ; sudo ntpd -q ; sudo service ntp start

I will salt that command everywhere once the update is finished.

Krinkle claimed this task.

Thanks.