Page MenuHomePhabricator

wikistats.analytics.eqiad.wmflabs blocking Prometheus scraping from metricsinfra
Closed, ResolvedPublic

Description

The analytics project is one of the Cloud VPS projects that was monitored by shinken and is now partially configured to be monitored using Prometheus in the metricsinfra project. The wikistats.analytics.eqiad.wmflabs instance includes Puppet manifests which enable ferm firewall management, but it is missing a hiera setting to allow scraping from the prometheus01.metricsinfra.eqiad.wmflabs instance.

Rather than making all monitored projects manage the prometheus_nodes hiera setting independently, we should set the default across all Cloud VPS projects to allow metricsinfra monitoring.

Event Timeline

Change 607297 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wncs: Set default prometheus_nodes value

https://gerrit.wikimedia.org/r/607297

bd808 triaged this task as Medium priority.
bd808 moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

Change 607297 merged by Bstorm:
[operations/puppet@production] wmcs: Set default prometheus_nodes value

https://gerrit.wikimedia.org/r/607297

Notice: The LDAP client stack for this host is: classic/sudoldap
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic/sudoldap'
Error: Could not set home on user[stats]: Execution of '/usr/sbin/usermod -d /var/lib/stats stats' returned 6: usermod: user 'stats' does not exist in /etc/passwd
Error: /Stage[main]/Statistics::User/User[stats]/home: change from '/home/stats' to '/var/lib/stats' failed: Could not set home on user[stats]: Execution of '/usr/sbin/usermod -d /var/lib/stats stats' returned 6: usermod: user 'stats' does not exist in /etc/passwd
Notice: /Stage[main]/Statistics::User/File[/var/lib/stats/.git-credentials]: Dependency User[stats] has failures: true
Warning: /Stage[main]/Statistics::User/File[/var/lib/stats/.git-credentials]: Skipping because of failed dependencies
Notice: /Stage[main]/Httpd/Httpd::Mod_conf[cgi]/Exec[ensure_present_mod_cgi]/returns: executed successfully
Info: /Stage[main]/Httpd/Httpd::Mod_conf[cgi]/Exec[ensure_present_mod_cgi]: Scheduling refresh of Service[apache2]
Warning: /Stage[main]/Statistics::User/Git::Userconfig[stats]/File[/var/lib/stats/.gitconfig]: Skipping because of failed dependencies
Warning: /Stage[main]/Statistics::Published/File[/srv/published-rsynced]: Skipping because of failed dependencies
Warning: /Stage[main]/Statistics::Published/Cron[hardsync-published]: Skipping because of failed dependencies
Warning: /Stage[main]/Statistics::Published/Rsync::Server::Module[published-destination]/File[/etc/rsync.d/frag-published-destination]: Skipping because of failed dependencies
Warning: /Stage[main]/Rsync::Server/Exec[compile fragments]: Skipping because of failed dependencies
Warning: /Stage[main]/Rsync::Server/Service[rsync]: Skipping because of failed dependencies
Warning: /Stage[main]/Statistics::Published/Rsync::Server::Module[published-destination]/Ferm::Service[rsyncd_access_published-destination]/File[/etc/ferm/conf.d/10_rsyncd_access_published-destination]: Skipping because of failed dependencies
Notice: /Stage[main]/Httpd/Service[apache2]: Triggered 'refresh' from 1 event
Warning: /Stage[main]/Ferm/Service[ferm]: Skipping because of failed dependencies
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns: Traceback (most recent call last):
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns:   File "/usr/local/sbin/x509-bundle", line 140, in <module>
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns:     main()
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns:   File "/usr/local/sbin/x509-bundle", line 119, in main
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns:     certpath.pop(0)
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns: IndexError: pop from empty list
Error: '/usr/local/sbin/x509-bundle --skip-root --skip-first -c /etc/ssl/localcerts/yarn.wikimedia.org.crt -o /etc/ssl/localcerts/yarn.wikimedia.org.chain.crt' returned 1 instead of one of [0]
Error: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/Exec[x509-bundle yarn.wikimedia.org-chain]/returns: change from 'notrun' to ['0'] failed: '/usr/local/sbin/x509-bundle --skip-root --skip-first -c /etc/ssl/localcerts/yarn.wikimedia.org.crt -o /etc/ssl/localcerts/yarn.wikimedia.org.chain.crt' returned 1 instead of one of [0]
Notice: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/File[/etc/ssl/localcerts/yarn.wikimedia.org.chain.crt]: Dependency Exec[x509-bundle yarn.wikimedia.org-chain] has failures: true
Warning: /Stage[main]/Profile::Tlsproxy::Service/Tlsproxy::Localssl[yarn.wikimedia.org]/Sslcert::Certificate[yarn.wikimedia.org]/Sslcert::Chainedcert[yarn.wikimedia.org]/File[/etc/ssl/localcerts/yarn.wikimedia.org.chain.crt]: Skipping because of failed dependencies
Warning: /Stage[main]/Nginx/Service[nginx]: Skipping because of failed dependencies
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 14.36 seconds

It appears puppet is the issue for the wikistats server.