Page MenuHomePhabricator

Quickstatements, "backend is overloaded"
Open, HighPublic

Description

For a few days now, people have reported errors and slowness on my quickstatements tool.
I just had a chance to look at the error.log and this is the latest:

2019-01-21 07:37:41: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 263
2019-01-21 07:37:41: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 263
2019-01-21 07:37:41: (mod_fastcgi.c.3594) all handlers for /quickstatements/api.php? on .php are down.
2019-01-21 07:37:57: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.quickstatements
2019-01-21 07:37:57: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.quickstatements
2019-01-21 07:38:02: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 263
2019-01-21 07:38:02: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 263
2019-01-21 07:38:02: (mod_fastcgi.c.3594) all handlers for /quickstatements/api.php?oauth_verifier=26c0fb6c9c8841ff17ca68bcb5d33d59&oauth_token=fc68e7d4aaa9f063b21be600ef627e9f on .php are down.
2019-01-21 07:38:21: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.quickstatements
2019-01-21 07:38:21: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.quickstatements

The log is basically filled with this.
Is there anything I can do? Anything for Labs to fix?

Event Timeline

Magnus created this task.Jan 21 2019, 8:25 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 21 2019, 8:25 AM
Aklapper renamed this task from backend issues? to "backend is overloaded".Jan 21 2019, 9:48 AM
Krenair edited projects, added Toolforge; removed Cloud-Services.Jan 21 2019, 9:56 AM
Addshore renamed this task from "backend is overloaded" to Quickstatements, "backend is overloaded".Jan 21 2019, 10:07 AM

As far as I can see, this is not limited to QuickStatements - when I'm logged out (as just happened again - my "usual" time is at 21 min into the hour), I cannot access SourceMD either. Access normally comes back around 37 min into the hour, even if I press the "log in" button immediately after being logged out.

As far as I can see, this is not limited to QuickStatements - when I'm logged out (as just happened again - my "usual" time is at 21 min into the hour), I cannot access SourceMD either. Access normally comes back around 37 min into the hour, even if I press the "log in" button immediately after being logged out.

I don't see the "backend is overloaded" errors in sourcemd. Might not be related.

bd808 added a subscriber: bd808.Jan 21 2019, 7:56 PM

The error message seems to indicate an overload of fcgi container that is running php scripts for the tool inside its Kubernetes container. See https://github.com/lighttpd/lighttpd1.4/blob/d1a23569161148f5acde8d4a6fb78c44284e1853/src/mod_fastcgi.c#L2906-L2909

The related part of the lighttpd configuration that is generated by webservice looks like this:

fastcgi.server += ( ".php" =>
        ((
                "bin-path" => "/usr/bin/php-cgi",
                "socket" => "/var/run/lighttpd/php.socket.quickstatements",
                "max-procs" => 2,
                "bin-environment" => (
                        "PHP_FCGI_CHILDREN" => "2",
                        "PHP_FCGI_MAX_REQUESTS" => "500"
                ),
                "bin-copy-environment" => (
                        "PATH", "SHELL", "USER"
                ),
                "broken-scriptfilename" => "enable",
                "allow-x-send-file" => "enable"
         ))
)

This spawns 2 /usr/bin/php-cgi processes (max-procs) each of which is configured to spawn 2 children to actually handle requests (PHP_FCGI_CHILDREN). This combination means the tool can handle 4 concurrent http requests. We do not currently have any way to tune these values per-tool via webservice when using either the grid engine or Kubernetes backends.

From my end, the repetitive logout issue looks resolved. Anyone know what caused and fixed it? Anyone still having problems?

There have been slow response from Toolforge services. I use sourcemd and scholia. sourcemd was just slow, and have been slow from time to time. In scholia I have seen error messages like:

Wed Jan 30 09:26:20 2019 - *** uWSGI listen queue of socket 3 full !!! (101/100) ***
Wed Jan 30 09:26:21 2019 - *** uWSGI listen queue of socket 3 full !!! (101/100) ***
Wed Jan 30 09:26:22 2019 - *** uWSGI listen queue of socket 3 full !!! (101/100) ***

I wonder if these problems are the same.

I am having those repetitive logout issues again.

It worked for a while today but now I'm logged out again and cannot log back in.
All edits through the tool seem to have stalled as well (for any user), and https://tools.wmflabs.org/quickstatements/api.php gives a HTTP 500 error.

Daniel_Mietchen triaged this task as High priority.Feb 15 2019, 2:57 PM
bd808 added a comment.Feb 16 2019, 1:03 AM

It worked for a while today but now I'm logged out again and cannot log back in.
All edits through the tool seem to have stalled as well (for any user), and https://tools.wmflabs.org/quickstatements/api.php gives a HTTP 500 error.

There is an active outage/degradation of the ToolsDB service (T216208) that is likely to be effecting Quickstatements and many other tools.