Page MenuHomePhabricator

Unable to access beta cluster
Closed, ResolvedPublic

Description

Beta cluster is down. Similar errors received at:

Request from 98.26.4.244 via deployment-cache-text05 deployment-cache-text05, Varnish XID 59009438
Error: 503, Backend fetch failed at Sun, 09 Dec 2018 22:32:09 GMT

Event Timeline

kostajh created this task.Dec 9 2018, 10:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2018, 10:37 PM
Quiddity updated the task description. (Show Details)Dec 9 2018, 10:45 PM
Quiddity added a subscriber: Quiddity.
Restricted Application added a subscriber: revi. · View Herald TranscriptDec 9 2018, 10:45 PM
Quiddity raised the priority of this task from Normal to Needs Triage.Dec 9 2018, 10:45 PM
Quiddity renamed this task from Unable to access en/ko.wikipedia.beta.wmflabs.org to Unable to access beta cluster.Dec 9 2018, 11:22 PM
Quiddity added a project: Operations.
Krenair added a subscriber: Krenair.

interesting, API is up

According to the deployment-mediawiki-07 apache logs it last handled a request at 2018-12-09T06:24:49. Restarting apache has fixed it:

Apache status on deployment-mediawiki-07
root@deployment-mediawiki-07:~# service apache2 status
● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/apache2.service.d
           └─puppet-override.conf
   Active: active (running) since Mon 2018-11-19 23:49:57 UTC; 2 weeks 5 days ago
  Process: 23439 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
 Main PID: 583 (apache2)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/apache2.service
           └─583 /usr/sbin/apache2 -k start

Dec 09 23:33:44 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:45 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:46 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:47 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:48 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:49 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:50 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:51 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:52 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
Dec 09 23:33:53 deployment-mediawiki-07 apache2[583]: [mpm_worker:error] [pid 583:tid 139862731084992] AH00288: scoreboard is full, not at MaxRequestWorkers
root@deployment-mediawiki-07:~# service apache2 restart
root@deployment-mediawiki-07:~# service apache2 status
● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/apache2.service.d
           └─puppet-override.conf
   Active: active (running) since Sun 2018-12-09 23:33:57 UTC; 1s ago
  Process: 8768 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
  Process: 23439 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
  Process: 8773 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
 Main PID: 8777 (apache2)
    Tasks: 109 (limit: 4915)
   CGroup: /system.slice/apache2.service
           ├─8777 /usr/sbin/apache2 -k start
           ├─8778 /usr/sbin/apache2 -k start
           ├─8779 /usr/sbin/apache2 -k start
           ├─8780 /usr/sbin/apache2 -k start
           └─8781 /usr/sbin/apache2 -k start

Dec 09 23:33:56 deployment-mediawiki-07 systemd[1]: apache2.service: Failed with result 'signal'.
Dec 09 23:33:56 deployment-mediawiki-07 systemd[1]: Starting The Apache HTTP Server...
Dec 09 23:33:57 deployment-mediawiki-07 apache2[8776]: [:notice] [pid 8776:tid 139872376874176] ModSecurity for Apache/2.9.1 (http://www.modsecurity.org/) configured.
Dec 09 23:33:57 deployment-mediawiki-07 apache2[8776]: [:notice] [pid 8776:tid 139872376874176] ModSecurity: Original server signature: Apache/2.4.25 (Debian)
Dec 09 23:33:57 deployment-mediawiki-07 systemd[1]: Started The Apache HTTP Server.
Dec 09 23:33:58 deployment-mediawiki-07 apache2[8777]: [mpm_worker:notice] [pid 8777:tid 139872376874176] AH00292: Apache/2.4.25 (Debian) deployment-mediawiki-07.deployment-prep.eqiad.wmflabs configured -
Dec 09 23:33:58 deployment-mediawiki-07 apache2[8777]: [core:notice] [pid 8777:tid 139872376874176] AH00094: Command line: '/usr/sbin/apache2'
Krenair claimed this task.Dec 9 2018, 11:36 PM

Mentioned in SAL (#wikimedia-releng) [2018-12-09T23:38:09Z] <Krenair> restarted apache on deployment-mediawiki-07 for T211524

Krenair closed this task as Resolved.EditedDec 9 2018, 11:39 PM

Since this is seemingly rare, this sort of problem would be unlikely to cause such huge problems in the production environment (if one apache implodes then varnish should just send to others), and it just requires a simple service restart, I'm going to mark this resolved... If it happens again please reopen.