We need to have better monitoring for Superset and Turnilo:
monitoring::service { 'superset': description => 'superset', check_command => "check_tcp!${::superset::port}", require => Class['::superset'], notes_url => 'https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset', } monitoring::service { 'turnilo': description => 'turnilo', check_command => "check_tcp!${port}", contact_group => $contact_group, notes_url => 'https://wikitech.wikimedia.org/wiki/Analytics/Systems/Turnilo-Pivot', }
These are effective only if the daemon is down, since the port is not available anymore, but not if the daemon is up but responding with errors (like happened this morning).
Superset is tricky due to the auth scheme, maybe we could have a test user and use something like https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Test_as_different_user_on_staging ? Turnilo should be easy.