Page MenuHomePhabricator

WiDaR goes into infinite recursion in oauth.php:doApiQuery due to Wikidata dispatching issues and maxlag
Closed, ResolvedPublic

Description

https://tools.wmflabs.org/widar/index.php

Stopped working, I restarted the webservice (shich starts), but it just "loads forever". Neither gridengine nor kubernetes work. I don't think I changed anything, so I assume something else broke.

This tool is the central login to several of my tools to edit Wikidata, please restore ASAP!

Event Timeline

Magnus created this task.Jul 25 2018, 2:31 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 25 2018, 2:31 PM
Magnus triaged this task as Unbreak Now! priority.Jul 25 2018, 2:31 PM
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptJul 25 2018, 2:31 PM

Update: Now webservice won't start anymore, says:

tools.widar@tools-bastion-03:~$  webservice --backend=kubernetes start
Looks like you already have another webservice running, with a gridengine backend
You should stop that webservice by issuing:
    webservice --backend=gridengine stop
And then start it again with backend kubernetes by issuing:
    webservice --backend=kubernetes start

Even though no gridengine webservice is running.

tools.widar@tools-bastion-03:~$     webservice --backend=gridengine stop
Your webservice is not running

tools.widar@tools-bastion-03:~$     webservice --backend=kubernetes start
Looks like you already have another webservice running, with a gridengine backend

Update: managed to start the kubernetes webengine again. Still "forever load".

There are some scripts from 2014 ("lighty-custom.sh", "lighty-starter.sh", "webstart.sh") in there, which I didn't write. Not sure if they interfere?

zhuyifei1999 lowered the priority of this task from Unbreak Now! to Needs Triage.Jul 25 2018, 3:39 PM
zhuyifei1999 added subscribers: Nehajha, zhuyifei1999.

Am I late? The page loads for me.

tools.widar@tools-bastion-03:~$     webservice --backend=gridengine stop
Your webservice is not running

tools.widar@tools-bastion-03:~$     webservice --backend=kubernetes start
Looks like you already have another webservice running, with a gridengine backend
tools.widar@tools-bastion-03:~$     webservice --backend=gridengine stop
Your webservice is not running

tools.widar@tools-bastion-03:~$     webservice --backend=kubernetes start
Looks like you already have another webservice running, with a gridengine backend

This is a known bug when the actual status of the webservice is inconsistent with service.manifest. CC @Nehajha

Magnus closed this task as Resolved.Jul 25 2018, 5:37 PM
Magnus claimed this task.

Yup, seems to have self-healed while I drove home. Not sure if feature or scary signs of self-awareness.

Magnus reopened this task as Open.Jul 26 2018, 10:25 AM

The problem is back. Same symptoms.

Magnus triaged this task as Unbreak Now! priority.Jul 26 2018, 10:26 AM
root@tools-webgrid-lighttpd-1417:~# ps ufww -u tools.widar
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
tools.w+  6226  0.0  0.0 106696  4252 ?        Ss   10:24   0:00 /usr/sbin/lighttpd -f /var/run/lighttpd/widar -D
tools.w+  6231  0.0  0.2 340556 20844 ?        Ss   10:24   0:00  \_ /usr/bin/php-cgi
tools.w+  6232  0.0  0.1 410420 12808 ?        S    10:24   0:00  |   \_ /usr/bin/php-cgi
tools.w+  6233  0.0  0.1 410056 12580 ?        S    10:24   0:00  |   \_ /usr/bin/php-cgi
tools.w+  6234  0.0  0.2 340556 20852 ?        Ss   10:24   0:00  \_ /usr/bin/php-cgi
tools.w+  6235  0.0  0.1 410668 12032 ?        S    10:24   0:00      \_ /usr/bin/php-cgi
tools.w+  6236  0.0  0.1 409800 12868 ?        S    10:24   0:00      \_ /usr/bin/php-cgi
tools.w+  6075  0.0  0.1 410948 12316 ?        S    10:21   0:00 /usr/bin/php-cgi
tools.w+  6074  0.0  0.1 410172 13124 ?        S    10:21   0:00 /usr/bin/php-cgi
tools.w+  6072  0.0  0.1 410680 13072 ?        S    10:21   0:00 /usr/bin/php-cgi
tools.w+  6071  0.0  0.1 410432 12988 ?        S    10:21   0:00 /usr/bin/php-cgi

Probably unrelated, but the last 4 processes have parent as init.

PIDs 6231 & 6234 are doing their typical 'wait forever on child processes'. strace on PID 6232 6233 6235 6236 looks mostly identical. 6071, 6072, 6074, 6075 are gone since the last message was posted.

Is there anything I can do? Anything I should have done?

PID 6232 6233 6235 6236 looks like in an infinite loop. The strace output: P7389.
gdb backtrace is not useful except:

#0  0x00002aeb030b4d4d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00002aeb030b4be4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x000000000060f03f in zif_sleep ()
#3  0x00000000006d7d2b in dtrace_execute_internal ()
#4  0x00002aeb03a2f476 in xdebug_execute_internal (current_execute_data=0x2aeb0131e958, fci=0x0, return_value_used=0) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1551
#5  0x0000000000797120 in ?? ()
#6  0x0000000000711a58 in execute_ex ()
#7  0x00000000006d7c29 in dtrace_execute_ex ()
#8  0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131e958) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
#9  0x0000000000797770 in ?? ()
[REPEAT]
#77 0x0000000000797770 in ?? ()
#78 0x0000000000711a58 in execute_ex ()
#79 0x00000000006d7c29 in dtrace_execute_ex ()
#80 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131c218) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
#81 0x0000000000797770 in ?? ()
#82 0x0000000000711a58 in execute_ex ()
#83 0x00000000006d7c29 in dtrace_execute_ex ()
#84 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131c100) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
#85 0x00000000006e92b0 in zend_execute_scripts ()
#86 0x0000000000689ae5 in php_execute_script ()
#87 0x0000000000462823 in main ()

Do you recognize any possibilities of infinite loops in your code?

Given that one can never exclude that, I don't think so. The code has been working for years. I now added a 30 seconds time limit to index.php, that should solve any infinite loop problems with my code, if they exists.

This seems more like infinite recursion rather than infinite loop:

(gdb) c
Continuing.
[New Thread 0x2aeb1558c700 (LWP 10499)]
[Thread 0x2aeb1558c700 (LWP 10499) exited]
^C
Program received signal SIGINT, Interrupt.
0x00002aeb030b4d4d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) define php_bt
Type commands for definition of "php_bt".
End with a line saying just "end".
>up 4
>p execute_data->function_state.function.common.function_name
>end
(gdb) php_bt
#4  0x00002aeb03a2f476 in xdebug_execute_internal (current_execute_data=0x2aeb01323398, fci=0x0, return_value_used=0)
    at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1551
1551	/build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c: No such file or directory.
No symbol "execute_data" in current context.
(gdb) php_bt
#8  0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb01323398) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
$5 = 0xb62a43 "sleep"
(gdb) php_bt
#12 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb01323158) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
$6 = 0x2aeb11894cd0 "doApiQuery"
(gdb) php_bt
#16 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb01322f18) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
$7 = 0x2aeb11894cd0 "doApiQuery"
(gdb) php_bt
[REPEAT]
(gdb) php_bt
#208 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131c3c0) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
$59 = 0x2aeb11894cd0 "doApiQuery"
(gdb) php_bt
#212 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131c218) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
$60 = 0x2aeb118420f8 "ensureAuth"
(gdb) php_bt
#216 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131c100) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
$61 = 0x2aeb118563b0 "setString"
(gdb) php_bt
#219 0x0000000000462823 in main ()
No symbol "execute_data" in current context.

I can see a coincidence here with Wikidata dispatching issues. And I suspect doApiQuery to result in an infine recursion if the same call is done again and again too long.

FWIW:

(gdb) f
#192 0x00002aeb03a2ea7c in xdebug_execute_ex (execute_data=0x2aeb0131cc18) at /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c:1437
1437	in /build/buildd/xdebug-2.2.3/xdebug-2.2.3/xdebug.c
(gdb) ptype *execute_data
type = struct _zend_execute_data {
    struct _zend_op *opline;
    zend_function_state function_state;
    zend_op_array *op_array;
    zval *object;
    HashTable *symbol_table;
    struct _zend_execute_data *prev_execute_data;
    zval *old_error_reporting;
    zend_bool nested;
    zval **original_return_value;
    zend_class_entry *current_scope;
    zend_class_entry *current_called_scope;
    zval *current_this;
    struct _zend_op *fast_ret;
    call_slot *call_slots;
    call_slot *call;
}
(gdb) p *execute_data
$69 = {opline = 0x2aeb11894be0, function_state = {function = 0x2aeb01356c50, arguments = 0x2aeb0131cdd0}, op_array = 0x2aeb01356c50, 
  object = 0x184b1f8, symbol_table = 0x0, prev_execute_data = 0x2aeb0131c9d8, old_error_reporting = 0x0, nested = 0 '\000', 
  original_return_value = 0x2aeb0131c9c0, current_scope = 0x2aeb013558a0, current_called_scope = 0x2aeb013558a0, current_this = 0x184b1f8, 
  fast_ret = 0x2aeb0131cd00, call_slots = 0x2aeb0131cd98, call = 0x2aeb0131cd98}
(gdb) ptype execute_data->function_state
type = struct _zend_function_state {
    zend_function *function;
    void **arguments;
}
(gdb) ptype *(execute_data->function_state.function)
type = union _zend_function {
    zend_uchar type;
    struct {
        zend_uchar type;
        const char *function_name;
        zend_class_entry *scope;
        zend_uint fn_flags;
        union _zend_function *prototype;
        zend_uint num_args;
        zend_uint required_num_args;
        zend_arg_info *arg_info;
    } common;
    zend_op_array op_array;
    zend_internal_function internal_function;
}
(gdb) p *(execute_data->function_state.function)
$70 = {type = 2 '\002', common = {type = 2 '\002', function_name = 0x2aeb11894cd0 "doApiQuery", scope = 0x2aeb013558a0, 
    fn_flags = 134283520, prototype = 0x0, num_args = 3, required_num_args = 1, arg_info = 0x2aeb11894ce0}, op_array = {type = 2 '\002', 
    function_name = 0x2aeb11894cd0 "doApiQuery", scope = 0x2aeb013558a0, fn_flags = 134283520, prototype = 0x0, num_args = 3, 
    required_num_args = 1, arg_info = 0x2aeb11894ce0, refcount = 0x2aeb03a1a7c0, opcodes = 0x2aeb11891730, last = 286, 
    vars = 0x2aeb11894df0, last_var = 16, T = 4, nested_calls = 1, used_stack = 4, brk_cont_array = 0x2aeb11894d40, last_brk_cont = 1, 
    try_catch_array = 0x0, last_try_catch = 0, has_finally_block = 0 '\000', static_variables = 0x0, this_var = 4294967295, 
    filename = 0x2aeb11878790 "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line_start = 411, 
    line_end = 523, 
    doc_comment = 0x2aeb11894d50 "/**\n\t * Send an API query with OAuth authorization\n\t *\n\t * @param array $post Post data\n\t * @param object $ch Curl handle\n\t * @return array API results\n\t */", doc_comment_len = 156, early_binding = 4294967295, literals = 0x2aeb118900d8, 
    last_literal = 143, run_time_cache = 0x184baf8, last_cache_slot = 59, reserved = {0x0, 0x0, 0x0, 0x0}}, internal_function = {
    type = 2 '\002', function_name = 0x2aeb11894cd0 "doApiQuery", scope = 0x2aeb013558a0, fn_flags = 134283520, prototype = 0x0, 
    num_args = 3, required_num_args = 1, arg_info = 0x2aeb11894ce0, handler = 0x2aeb03a1a7c0, module = 0x2aeb11891730}}

I don't know what the function arguments are since they are pointers to void pointers (void **), and idk what to cast the 'void' to.

zhuyifei1999 renamed this task from Tool WiDaR down, won't restart to WiDaR goes into infinite recursion in oauth.php:doApiQuery due to Wikidata dispatching issues and maxlag.Jul 26 2018, 11:22 AM
zhuyifei1999 removed zhuyifei1999 as the assignee of this task.
zhuyifei1999 lowered the priority of this task from Unbreak Now! to High.
zhuyifei1999 edited projects, added Tools; removed Toolforge.

@matej_suchanek thanks, looking at the source of /mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php
doApiQuery that is indeed a likely cause.

Thanks, I have added an interation counter that will limit infinite recursion. Odd that this should happen though, unless Wikidata is closed for business...

Commit here:
https://bitbucket.org/magnusmanske/magnustools/commits/63683015bd533663880bff1588e04fc79e414558#chg-public_html/php/oauth.php

Can I restart the webserver, or does that require root?

zhuyifei1999 added a comment.EditedJul 26 2018, 12:29 PM

With some help from 1 2 & 3, for future fun, I made a script to generate a PHP traceback in Python-style (since I'm not familiar with what PHP-style tracebacks looks like):

(gdb) set pagination off
(gdb) python
>frame = gdb.newest_frame()
>while frame.name() != 'xdebug_execute_ex':
>    frame = frame.older()
>
>print('Traceback (most recent call first):')
>
>
>def p_stack(filename, lineno, funcname):
>    print('  File "{}", line {}, in {}'.format(
>        filename if filename is not None else '<possibly built-in>',
>        lineno, funcname))
>
>    if filename is not None:
>        with open(filename, 'rb') as file:
>            print('    ' +
>                  file.read().split(b'\n')[int(lineno)-1]
>                  .decode('utf-8').strip())
>
>filename, lineno = None, '?'
>
>ex_data = frame.read_var('execute_data')
>while ex_data:
>    p_stack(filename, lineno, 
>        ex_data['function_state']['function']['common']['function_name'].string())
>
>    filename = ex_data['op_array']['filename'].string()
>    lineno = int(ex_data['opline']['lineno'])
>
>    ex_data = ex_data['prev_execute_data']
>
>p_stack(filename, lineno, '<main>')
>end
Traceback (most recent call first):
  File "<possibly built-in>", line ?, in sleep
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 517, in doApiQuery
    sleep ( $lag ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 519, in doApiQuery
    $ret = $this->doApiQuery( $post, $ch , '' ) ;
  File "/mnt/nfs/labstore-secondary-tools-project/magnustools/public_html/php/oauth.php", line 609, in getConsumerRights
    ), $ch );
  File "/mnt/nfs/labstore-secondary-tools-project/widar/public_html/index.php", line 1120, in <main>
    $res = $oa->getConsumerRights() ;
(gdb)

(The files was changed while the process was still running, which would make the 'code' displayed inaccurate. I adjusted the output to show what would be displayed before the change. The actual output can be seen in this comment history.)

Can I restart the webserver, or does that require root?

Feel free to. I terminated my gdb session.

Magnus closed this task as Resolved.Jul 26 2018, 12:55 PM
Magnus claimed this task.