Page MenuHomePhabricator

puppetdb failures
Closed, ResolvedPublic

Description

Just now puppet runs failed on every production host. Puppet was failing to fetch the facts from puppetdb on nitrogen.

First I did 'service puppetdb restart' on nigrogen. This command took a long time (maybe a minute) to complete. Afterwards, puppet runs were still failing with the same error.

Then I restarted nginx on nitrogen. That returned quickly, and puppet ran smoothly everywhere after that.

Let's track and see if this keeps happening.

Event Timeline

This correlates with running catalog diffs in bulk from puppetcompiler1001 and must be related. I ran a diff across all nodes serially overnight last night waiting 20 seconds in between nodes with no issue. Today I tried running the diff with 5 seconds between nodes which must have overloaded the puppetdb. Will make sure this runs with 20s+ sleep between nodes from now on.

By the way the bulk diff is running from a root screen on puppetcompiler1001.eqiad.wmnet. Should these symptoms re-occur this can be checked and if running stopped with ctrl-c.

herron claimed this task.