Page MenuHomePhabricator

Make Google OCR API on Tool Labs work under Kubernetes
Closed, ResolvedPublic3 Estimated Story Points

Description

I tried switching the Google OCR API on Tool Labs from GridEngine to Kubernetes for the webservice backend so that it would be more reliable. This resulted in the API no longer functioning and returning a 500 internal server error instead.

Looking at the error log on Tool Labs:

2016-09-21 18:48:18: (mod_fastcgi.c.2569) unexpected end-of-file (perhaps the fastcgi process died): pid: 10 socket: unix:/var/run/lighttpd/php.socket.ws-google-ocr-1 
2016-09-21 18:48:18: (mod_fastcgi.c.3353) response not received, request sent: 1960 on socket: unix:/var/run/lighttpd/php.socket.ws-google-ocr-1 for /ws-google-ocr/api.php?image=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/%E0%A6%86%E0%A6%A6%E0%A6%BF%E0%A6%AA%E0%A6%BE%E0%A6%A0_-_%E0%A6%AE%E0%A6%BE%E0%A6%9C%E0%A6%9B%E0%A7%8B%E0%A7%B1%E0%A6%BE.pdf/page15-599px-%E0%A6%86%E0%A6%A6%E0%A6%BF%E0%A6%AA%E0%A6%BE%E0%A6%A0_-_%E0%A6%AE%E0%A6%BE%E0%A6%9C%E0%A6%9B%E0%A7%8B%E0%A7%B1%E0%A6%BE.pdf.jpg&lang=as, closing connection

The Tool Labs project is located at http://tools.wmflabs.org/ws-google-ocr/ and the code can be found at https://phabricator.wikimedia.org/diffusion/1966/.

Event Timeline

DannyH triaged this task as Medium priority.Sep 21 2016, 9:38 PM
DannyH set the point value for this task to 3.Sep 22 2016, 5:31 PM
DannyH moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.

I'm not 100% sure this is isolated to the kubernetes runtime:

tools.ws-google-ocr@tools-bastion-02:~$ tail -50 error.log
2016-09-22 06:50:47: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:50:47: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:50:47: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:50:49: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:50:49: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:50:51: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:50:51: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:50:51: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:50:51: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:50:53: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:50:53: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:51:16: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:51:16: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:51:16: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:51:16: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:51:18: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:51:18: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:51:25: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:51:25: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:51:25: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:51:25: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:51:26: (mod_fastcgi.c.3567) all handlers for /ws-google-ocr/api.php?image=https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/%E0%A4%85%E0%A4%A6%E0%A5%8D%E0%A4%B5%E0%A5%88%E0%A4%A4%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BF%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%A4%E0%A4%B8%E0%A4%BE%E0%A4%B0%E0%A4%83.pdf/page286-1024px-%E0%A4%85%E0%A4%A6%E0%A5%8D%E0%A4%B5%E0%A5%88%E0%A4%A4%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BF%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%A4%E0%A4%B8%E0%A4%BE%E0%A4%B0%E0%A4%83.pdf.jpg&lang=sa on .php are down.
2016-09-22 06:51:27: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:51:27: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:01: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:52:01: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:52:01: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:52:01: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:52:03: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:03: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:04: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:52:04: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:52:04: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:52:04: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:52:05: (mod_fastcgi.c.3567) all handlers for /ws-google-ocr/api.php?image=https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/%E0%A4%85%E0%A4%A6%E0%A5%8D%E0%A4%B5%E0%A5%88%E0%A4%A4%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BF%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%A4%E0%A4%B8%E0%A4%BE%E0%A4%B0%E0%A4%83.pdf/page285-1024px-%E0%A4%85%E0%A4%A6%E0%A5%8D%E0%A4%B5%E0%A5%88%E0%A4%A4%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BF%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%A4%E0%A4%B8%E0%A4%BE%E0%A4%B0%E0%A4%83.pdf.jpg&lang=sa on .php are down.
2016-09-22 06:52:06: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:06: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:29: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:52:29: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:52:29: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:52:29: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:52:31: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:31: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:33: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-1
2016-09-22 06:52:33: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
2016-09-22 06:52:33: (mod_fastcgi.c.1733) connect failed: Permission denied on unix:/var/run/lighttpd/php.socket.ws-google-ocr-0
2016-09-22 06:52:33: (mod_fastcgi.c.2999) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1
2016-09-22 06:52:34: (mod_fastcgi.c.3567) all handlers for /ws-google-ocr/api.php?image=https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/%E0%A4%85%E0%A4%A6%E0%A5%8D%E0%A4%B5%E0%A5%88%E0%A4%A4%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BF%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%A4%E0%A4%B8%E0%A4%BE%E0%A4%B0%E0%A4%83.pdf/page285-1024px-%E0%A4%85%E0%A4%A6%E0%A5%8D%E0%A4%B5%E0%A5%88%E0%A4%A4%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BF%E0%A4%B8%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%A7%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%A4%E0%A4%B8%E0%A4%BE%E0%A4%B0%E0%A4%83.pdf.jpg&lang=sa on .php are down.
2016-09-22 06:52:35: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-09-22 06:52:35: (mod_fastcgi.c.2757) fcgi-server re-enabled:  0 /var/run/lighttpd/php.socket.ws-google-ocr
2016-10-27 06:26:57: (mod_fastcgi.c.2569) unexpected end-of-file (perhaps the fastcgi process died): pid: 9 socket: unix:/var/run/lighttpd/php.socket.ws-google-ocr-1 
2016-10-27 06:26:57: (mod_fastcgi.c.3353) response not received, request sent: 906 on socket: unix:/var/run/lighttpd/php.socket.ws-google-ocr-1 for /ws-google-ocr/index.php?, closing connection

These errors were related to the tool's usage of the icecave/isolator library, so we've just dropped that usage and it's now running happily on Kubernetes. (That lib was being used as part of a thing to turn errors into exceptions.)

I'm afraid I don't yet have a solid explanation for what was going wrong. Under some (which?) circumstances it was segfaulting — backtrace:

Core was generated by `/usr/bin/php-cgi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000573eda in phar_object_init ()
(gdb) bt
#0  0x0000000000573eda in phar_object_init ()
#1  0x00007f36c4bbbdc8 in ?? ()
#2  0x00007f36db05c820 in ?? ()
#3  0x000000000074619e in ?? ()
#4  0x00007f36c52ca390 in ?? ()
#5  0x0000000000ebac40 in cwd_globals ()
#6  0x0000000000735df0 in ?? ()
#7  0x0000000000ebae80 in cwd_globals ()
#8  0x00007f36c4c0ca60 in ?? ()
#9  0x00007f36c4c0d538 in ?? ()
#10 0x0000000000568be4 in zim_Phar_mapPhar ()
#11 0x0000000000e9fcee in __bss_start ()
#12 0x0000000000000000 in ?? ()

Nice work!

@bd808: Is there some documentation somewhere about things that won't work under Kubernetes? If so, you should add the the icecave/isolator library.

Having looked at the icecave/isolator library briefly, I'm not sure that it really works anywhere in a robust and stable manner. It does some pretty gross things.

@bd808 I quite agree! It's a ridiculous library. I mean, all I wanted was a simple thing to turn errors into exceptions (and so was using elequent/asplode), but really until I dug into it I hadn't realised just how overly complicated it makes it. I'll find another ErrorException maker — and not worry too much about uncovering exactly what it is in icecave/isolator that was causing the probs. :-)