Summary
scap canaries hosts are hardcoded in operations/puppet.git in a hiera configuration file.
- The switch other process should have an action to update list of canaries
- Puppet roles mediawiki::canary_appserver, mediawiki::appserver::canary_api are apparently legacy/useless.
- From discussion with SRE : scap dsh groups/list of canaries should move to conftool
From the last scap sync-file log:
01:14:16 Finished Canaries Synced (duration: 00m 03s) 01:14:16 Executing check 'Check endpoints for mw1279.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1276.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1261.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1264.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mwdebug1002.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mwdebug1001.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1263.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1262.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1278.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1277.eqiad.wmnet' 01:14:16 Executing check 'Check endpoints for mw1265.eqiad.wmnet' 01:14:16 Check 'Check endpoints for mw1276.eqiad.wmnet' failed: /wiki/{title} (Main Page) is CRITICAL: Test Main Page returned the unexpected status 503 (expecting: 200); /wiki/{title} (Special Version) is CRITICAL: Test Special Version returned the unexpected status 503 (expecting: 200); /w/api.php (Main Page pageprops) is CRITICAL: Test Main Page pageprops returned the unexpected status 503 (expecting: 200) 01:14:18 Finished Canary Endpoint Check Complete (duration: 00m 02s) 01:14:18 Waiting for canary traffic... 01:14:36 Executing check 'Logstash Error rate for mw1279.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1276.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1261.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1264.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mwdebug1002.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mwdebug1001.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1263.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1262.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1278.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1277.eqiad.wmnet' 01:14:36 Executing check 'Logstash Error rate for mw1265.eqiad.wmnet' 01:14:36 Finished sync-check-canaries (duration: 00m 23s) 01:14:36 Started sync-proxies
It should be checking canary servers in codfw instead because the eqiad ones are dormant / not useful.