Page MenuHomePhabricator

Toolforge tool.heritage webservice keeps crashing
Open, HighPublic

Description

The webservice process goes away, and the webapp returns a 500 − Internal Server Error. This happened twice in the last 24 hours.

Extract from error.log

2017-09-18 07:23:24: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 263 
2017-09-18 07:23:26: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.heritage 
2017-09-18 07:23:26: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.heritage 
2017-09-18 07:23:52: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 263 
2017-09-18 07:23:52: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 263 
2017-09-18 07:23:53: (mod_fastcgi.c.3594) all handlers for /heritage/api/api.php?action=search&etc on .php are down. 
2017-09-18 07:23:54: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.heritage 
2017-09-18 07:23:54: (mod_fastcgi.c.2765) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.heritage 
2017-09-18 07:23:57: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 263 
2017-09-18 07:23:57: (mod_fastcgi.c.3002) backend is overloaded; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 263 
2017-09-18 07:23:58: (server.c.1558) server stopped by UID = 0 PID = 0 
2017-09-18 07:24:07: (log.c.164) server started

Event Timeline

Emijrp raised the priority of this task from Medium to High.Sep 19 2017, 7:36 AM
Emijrp added a subscriber: Emijrp.

My map doesn't show any monuments https://tools.wmflabs.org/wlm-maps/ I guess this bug is related.

If you connect to sql local, use s51138__heritage_p, and try to query the database, it does nothing.

My map doesn't show any monuments https://tools.wmflabs.org/wlm-maps/ I guess this bug is related.

If you connect to sql local, use s51138__heritage_p, and try to query the database, it does nothing.

Hmmm, no the web process should not impact the DB availability. The DB update hung up for 8 hours, I believe because the categorize image job was running at the same time, locking the table. Should be fine now − can you confirm ?

@JeanFred Is it the categorization job which made the DB sad today?

@JeanFred Is it the categorization job which made the DB sad today?

Yes. as this is a separate issue I (finally) filed T176982 for this.