https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out
Closed, DuplicatePublic

Description

Selenium job fails reaching the beta cluster api :(

https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out

Example: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/210/ which ran on integration-slave-trusty-1004.integration.eqiad.wmflabs

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 16 2016, 12:56 PM
hashar triaged this task as Unbreak Now! priority.Nov 16 2016, 12:56 PM
Restricted Application added subscribers: Jay8g, Luke081515, TerraCodes. · View Herald TranscriptNov 16 2016, 12:56 PM

Mentioned in SAL (#wikimedia-releng) [2016-11-16T13:02:33Z] <hashar> Restarted HHVM on deployment-mediawiki05 was not honoring requests T150849

hashar updated the task description. (Show Details)Nov 16 2016, 1:02 PM
hashar closed this task as Resolved.
hashar claimed this task.

Fixed by restarting HHVM on deployment-mediawiki05.

From IRC logs:

[06:32:19] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50624 bytes in 1.503 second response time
[06:33:26] <Amir1>	 !log ladsgroup@deployment-mediawiki05:~$ sudo service hhvm restart
[06:33:38] <mutante>	 on deployment-mediawiki05 it says the status of hhvm is running and no such error
[07:12:19] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 1546 bytes in 0.653 second response time
[07:28:23] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50624 bytes in 1.385 second response time
[09:18:27] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[09:43:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:02:33] <hashar>	 !log Restarted HHVM on deployment-mediawiki05 was not honoring requests T150849
[13:03:20] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 44262 bytes in 3.517 second response time
[13:03:33] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798812 (10hashar) 05Open>03Resolved a:03hashar Fixed by restarting HHVM on deployment-mediawiki05.

I guess I forgot to restart HHVM on deployment-mediawiki05 while doing T150833