Page MenuHomePhabricator

`webservice --backend=kubernetes python restart` starts a php5.6 webservice
Closed, ResolvedPublic

Description

If the webservice is not running, trying to restart a python webservice will result in a php5.6 webservice:

$ webservice stop
Stopping webservice
$ webservice --backend=kubernetes python restart
Your job is not running, starting...
$ webservice status
Your webservice of type php5.6 is running

Expected behavior: start a webservice of type python .

Event Timeline

I am able to re-produce this. Thanks for reporting.

Is the running Kubernetes pod actually using the php5.6 container or is this just a problem of the $HOME/service.manifest state tracking file not being updated properly?

Looking at the webservice script's code I can see that the 'restart' action does not call the internal update_manifest() method. I think this works as expected if the action was actually a restart (stopping a running webservice and then starting the process again), but if the initial state of $HOME/service.manifest is the "empty" status that is recorded when webservice stop completes I think the resulting $HOME/service.manifest would be corrupt in a sense because of the missing update_manifest() call that webservice --backend=kubernetes python start would have processed.

webservice
if args.action == 'start':
    if job.get_state() != Backend.STATE_STOPPED:
        print('Your job is already running')
        sys.exit(1)

    start(job, 'Starting webservice')
    update_manifest(job, 'start')

elif args.action == 'stop':
    if job.get_state() == Backend.STATE_STOPPED:
        print('Your webservice is not running')
    else:
        stop(job, 'Stopping webservice')
    update_manifest(job, 'stop')

elif args.action == 'restart':
    if job.get_state() != Backend.STATE_RUNNING:
        start(job, 'Your job is not running, starting')
    else:
        if stop(job, 'Restarting webservice'):
            start(job, '')
        else:
            print('ERROR: Pod resisted shutdown')
            sys.exit(1)
    tool.save_manifest()

What's interesting is that this problem only happens when you run a restart.

$ webservice stop        
Your webservice is not running

$ webservice --backend=kubernetes python restart
Your job is not running, starting.......

$ webservice status
Your webservice of type php5.6 is running

$ cat service.manifest 
# This file is used by toollabs infrastructure.
# Please do not edit manually at this time.
backend: kubernetes
version: 3

But when you invoke start, it reports correctly...

$ webservice stop
Your webservice is not running

$ rm -rf service.manifest 

$ webservice --backend=kubernetes python start
Starting webservice....

$ webservice status
Your webservice of type python is running

$ cat service.manifest 
# This file is used by toollabs infrastructure.
# Please do not edit manually at this time.
backend: kubernetes
distribution: debian
version: 3
web: python

So I think your hunch on service.manifest being corrupt seems to be spot on.

Change 539895 had a related patch set uploaded (by Phamhi; owner: Hieu Pham):
[operations/software/tools-webservice@master] tools-webservice: Disallow restart unless webservice type is defined in advance.

https://gerrit.wikimedia.org/r/539895

Change 539895 merged by jenkins-bot:
[operations/software/tools-webservice@master] tools-webservice: Run update_manifest() on restart.

https://gerrit.wikimedia.org/r/539895

Mentioned in SAL (#wikimedia-cloud) [2019-10-16T16:21:00Z] <phamhi> Deployed toollabs-webservice 0.46 to buster-tools and stretch-tools (T218461)

The new version of toollabs-webservice package 0.46 with this fix has been pushed out to:

tools-checker-03
tools-sgebastion-[07-09]
tools-sgecron-01
tools-sgewebgrid-generic-[0901-0904]
tools-sgewebgrid-lighttpd-[0902-0928]

Confirmed that this bug no longer exists:

$ webservice stop
$ rm -rf service.manifest 

$ webservice --backend=kubernetes python restart
Your job is not running, starting.....
$ webservice status                            
Your webservice of type python is running

I will update the docker images to include this fix.

All jessie and stretch docker images have been rebuilt with toollabs-webservice 0.46 package installed. Update images have been pushed to the docker registry.