Page MenuHomePhabricator

Puppet failure on all hosts with Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Mysql::Error: Out of range value for column 'id' at row 1: INSERT INTO `fact_values` (`updated_at`, `host_id`, `created_at`, `fact_name_id`, `value`) VALUES('date', etc)
Closed, ResolvedPublic

Description

For some reason, puppet facts on MySQL are stored with a large gap (probably because they are deleted an inserted with new ids every time). This lead to an overflow on the id column of the table fact_values of the puppet database, backend to the puppet operations infrastructure (as of these lines, db1001 is the master of the m1 shard). The error will show on the puppet logs as:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Mysql::Error: Out of range value for column 'id' at row 1: INSERT INTO `fact_values` (`updated_at`, `host_id`, `created_at`, `fact_name_id`, `value`) VALUES('2015-08-03 12:45:36', 512, '2015-08-03 12:45:36', 432, '9677.12')
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

TRUNCATE TABLE puppet.fact_values; resets the id values and fixes the issue (following puppet runs will recreate the values).

Monitor the table for the following days to see if the gap is a large issue and it requires a more long-term fix (converting to bigint, patching puppet)

Event Timeline

jcrespo raised the priority of this task from to Low.
jcrespo updated the task description. (Show Details)
jcrespo added projects: acl*sre-team, DBA, Puppet.
jcrespo added a subscriber: jcrespo.

Workaround documented on https://wikitech.wikimedia.org/wiki/Puppet#Troubleshooting

We believe this will fix issues for 2E9 years... or it will fail horribly on January 2017. But better try than doing nothing.