Page MenuHomePhabricator

scap error on 2022-06-21 deploy
Closed, ResolvedPublic

Description

lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ scap sync-file wmf-config/InitialiseSettings.php 'Config: [[gerrit:806877|Enable Lexeme Lua access everywhere (T309593)]] (1/2)'
           ___ ____
         ⎛   ⎛ ,----
          \  //==--'
     _//|,.·//==--'    ____________________________
    _OO≣=-  ︶ ᴹw ⎞_§ ______  ___\ ___\ ,\__ \/ __ \
   (∞)_, )  (     |  ______/__  \/ /__ / /_/ / /_/ /
     ¨--¨|| |- (  / ______\____/ \___/ \__^_/  .__/
         ««_/  «_/ jgs/bd808                /_/

15:29:03 Started cache_git_info
15:29:04 Finished cache_git_info (duration: 00m 00s)
15:29:04 Checking for new runtime errors locally
15:29:05 Started sync-masters
sync-masters: 100% (in-flight: 0; ok: 1; fail: 0; left: 0)
15:29:13 Finished sync-masters (duration: 00m 08s)
15:29:13 Started sync-pull-masters
sync-pull-masters: 100% (in-flight: 0; ok: 1; fail: 0; left: 0)
15:29:15 Finished sync-pull-masters (duration: 00m 01s)
15:29:15 Started sync-check-canaries
sync-canaries: 100% (in-flight: 0; ok: 9; fail: 0; left: 0)
15:29:16 Per-host sync duration: average 1.1s, median 1.1s
15:29:16 rsync transfer: average 797,971 bytes/host, total 7,181,739 bytes
15:29:16 Finished Canaries Synced (duration: 00m 01s)
15:29:16 Started php-fpm-restarts
15:29:16 Running '/usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807' on 9 host(s)
15:29:57 Finished php-fpm-restarts (duration: 00m 40s)
15:29:57 Executing check 'Check endpoints for mw1449.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1417.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1415.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1448.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1418.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1416.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1447.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1450.eqiad.wmnet'
15:29:57 Executing check 'Check endpoints for mw1414.eqiad.wmnet'
15:29:58 Finished Canary Endpoint Check Complete (duration: 00m 42s)
15:29:58 Executing check 'Logstash Error rate for mw1449.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1417.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1415.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1448.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1418.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1416.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1447.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1450.eqiad.wmnet'
15:29:58 Executing check 'Logstash Error rate for mw1414.eqiad.wmnet'
15:29:59 Finished sync-check-canaries (duration: 00m 43s)
15:29:59 Started sync-proxies
sync-proxies: 100% (in-flight: 0; ok: 8; fail: 0; left: 0)
15:30:01 Per-host sync duration: average 1.4s, median 1.4s
15:30:01 rsync transfer: average 797,971 bytes/host, total 6,383,768 bytes
15:30:01 Finished sync-proxies (duration: 00m 02s)
15:30:01 Started sync-apaches
sync-apaches: 100% (in-flight: 0; ok: 352; fail: 0; left: 0)
15:30:08 Per-host sync duration: average 1.3s, median 1.3s
15:30:08 rsync transfer: average 797,971 bytes/host, total 280,885,792 bytes
15:30:08 Finished sync-apaches (duration: 00m 07s)
15:30:08 Started php-fpm-restarts
15:30:08 Running '/usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807' on 307 host(s)
15:30:25 /usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807 (ran as mwdeploy@mw2327.codfw.wmnet) returned [2]: Restarting php7.2-fpm: free opcache 746 MB
2022-06-21 15:30:24,914 [INFO] Depooling currently pooled services
2022-06-21 15:30:25,231 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d6fd5f8>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,234 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d6dd6a0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,237 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d6ddda0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,240 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d6ddd68>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,243 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d6dd6a0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,245 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d6fd128>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,248 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d84aeb8>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,251 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d7d74a8>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,253 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d7d7ba8>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,256 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2327.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fed0d70b2e8>: Failed to establish a new connection: [Errno 111] Connection refused'))
<poolcounter.client.Response object at 0x7fed0db2def0>
2022-06-21 15:30:25,256 [ERROR] Error running command with poolcounter: local variable 'status' referenced before assignment
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/poolcounter/client.py", line 384, in run
    callback(*args)
  File "/usr/local/bin/safe-service-restart", line 149, in run_and_raise
    rc = self.run()
  File "/usr/local/bin/safe-service-restart", line 170, in run
    if not self.depool(pooled):
  File "/usr/local/bin/safe-service-restart", line 246, in depool
    self._verify_status(False, pooled)
  File "/usr/local/bin/safe-service-restart", line 274, in _verify_status
    self._fetch_retry(url, want_pooled, desired_status)
  File "/usr/local/bin/safe-service-restart", line 333, in _fetch_retry
    raise PoolStatusError(str(status))
UnboundLocalError: local variable 'status' referenced before assignment

15:30:25 /usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807 (ran as mwdeploy@mw2257.codfw.wmnet) returned [2]: Restarting php7.2-fpm: free opcache 746 MB
2022-06-21 15:30:19,917 [INFO] Depooling currently pooled services
2022-06-21 15:30:19,990 [WARNING] LB lvs2009:9090 reports pool appservers-https_443/mw2257.codfw.wmnet as enabled/up/pooled, should be disabled/*/not pooled
2022-06-21 15:30:25,231 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a12e518>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,234 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a0880f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,237 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a088d68>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,240 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a0889e8>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,242 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a088160>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,244 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a12ec88>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,246 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a263e10>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,248 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a1ec4e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,250 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a1ecbe0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,253 [WARNING] Issues connecting to lvs2010:9090: HTTPConnectionPool(host='lvs2010', port=9090): Max retries exceeded with url: /pools/appservers-https_443/mw2257.codfw.wmnet (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d8a122320>: Failed to establish a new connection: [Errno 111] Connection refused'))
2022-06-21 15:30:25,253 [ERROR] Error running command with poolcounter: local variable 'status' referenced before assignment
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/poolcounter/client.py", line 384, in run
    callback(*args)
  File "/usr/local/bin/safe-service-restart", line 149, in run_and_raise
    rc = self.run()
  File "/usr/local/bin/safe-service-restart", line 170, in run
    if not self.depool(pooled):
  File "/usr/local/bin/safe-service-restart", line 246, in depool
    self._verify_status(False, pooled)
  File "/usr/local/bin/safe-service-restart", line 274, in _verify_status
    self._fetch_retry(url, want_pooled, desired_status)
  File "/usr/local/bin/safe-service-restart", line 333, in _fetch_retry
    raise PoolStatusError(str(status))
UnboundLocalError: local variable 'status' referenced before assignment
<poolcounter.client.Response object at 0x7f0d8b7e2e10>

15:30:42 2 hosts had failures restarting php-fpm
15:32:54 Finished php-fpm-restarts (duration: 02m 46s)
15:32:54 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877|Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)

lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ scap version
4.9.4-1

Event Timeline

@Lucas_Werkmeister_WMDE also wrote:

I think there’s two issues there – the failed connection to lvs2010 (understandable if that was being worked on at the moment), and the fact that https://gerrit.wikimedia.org/g/operations/puppet/+/c8cb4a1796d5ff22803c171c277943eebecb8ee7/modules/conftool/files/safe-service-restart.py#333 throws an error if status was never assigned

At the time this was reported @klausman was logging SAL message like: !log Restarting pybal on lvs2010

Change 807624 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] safe-service-restart.py: Ensure 'status' always has a value

https://gerrit.wikimedia.org/r/807624

Change 807624 merged by RLazarus:

[operations/puppet@production] safe-service-restart.py: Avoid uninitialized access to 'status'

https://gerrit.wikimedia.org/r/807624