Page MenuHomePhabricator

RESTbase broken on beta cluster
Closed, ResolvedPublicBUG REPORT

Event Timeline

TheresNoTime changed the subtype of this task from "Task" to "Bug Report".
TheresNoTime added a project: RESTBase.

Putting High, as it appears to be blocking some CI (T350353)

Puppet hasn't run on deployment-restbase04, same on deployment-restbase-bullseye — cannae be good?

The last Puppet run was at Mon Sep 25 12:00:42 UTC 2023 (71979 minutes ago).
Last Puppet commit:
Last login: Wed Mar 29 10:55:02 2023 from 172.16.3.145
samtar@deployment-restbase04:~$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Class[Profile::Cassandra]: parameter 'graphite_host' expects a Stdlib::Host = Variant[Stdlib::Fqdn = Pattern[/\A(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])\z/], Stdlib::Compat::Ip_address = Variant[Stdlib::Compat::Ipv4 = Pattern[/^((([0-9](?!\d)|[1-9][0-9](?!\d)|1[0-9]{2}(?!\d)|2[0-4][0-9](?!\d)|25[0-5](?!\d))[.]){3}([0-9](?!\d)|[1-9][0-9](?!\d)|1[0-9]{2}(?!\d)|2[0-4][0-9](?!\d)|25[0-5](?!\d)))(\/((([0-9](?!\d)|[1-9][0-9](?!\d)|1[0-9]{2}(?!\d)|2[0-4][0-9](?!\d)|25[0-5](?!\d))[.]){3}([0-9](?!\d)|[1-9][0-9](?!\d)|1[0-9]{2}(?!\d)|2[0-4][0-9](?!\d)|25[0-5](?!\d))|[0-9]+))?$/], Stdlib::Compat::Ipv6 = Pattern[/\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/]]] value, got Undef (file: /etc/puppet/modules/role/manifests/restbase/production.pp, line: 10, column: 5) on node deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

The graphite_host issue may be related to https://wikitech.wikimedia.org/wiki/News/2023_Cloud_VPS_metrics_changes — I'm unsure of what, if any, changes need to be made to get puppet to pass (and if that would resolve the larger issue)

Change 974946 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] cassandra: remove references to graphite

https://gerrit.wikimedia.org/r/974946

It seems that restbase has had graphite_host set as an option since forever, and in production instead of removing that as a parameter, graphite_host was set to "none". Removing graphite_host caused an error (despite this being the right idea). This change fixes the issue, but to unblock things I've added graphite_host: "none" to the deployment-restbase prefix and puppet is running now. Once this change is merged we can remove the hack. For now it looks like restbase has recovered

TheresNoTime raised the priority of this task from High to Needs Triage.Nov 16 2023, 10:55 AM

Change 974946 merged by Hnowlan:

[operations/puppet@production] cassandra: remove references to graphite

https://gerrit.wikimedia.org/r/974946

Can this be closed, or are things still broken? (they don't seem to be..?)