Page MenuHomePhabricator

Run non-interactive commands on Toolforge kubernetes webservices
Open, MediumPublic

Description

It would be good to be able to run arbitrary non-interactive commands on kubernetes webservices.

Currently it you need to use the webservice --backend=kubernetes <type> shell command but this is always interactive.

For example to update the dependencies of a python webservice running on kubernetes you need to do something like this:

webservice --backend=kubernetes python shell

<wait for interactive shell on kubernetes controlled instance>

webservice-python-bootstrap

It would be nice to have something like:
webservice --backend=kubernetes python shell -- webservice-python-bootstrap

Event Timeline

Restricted Application edited projects, added Cloud-Services; removed Toolforge. · View Herald TranscriptJul 5 2017, 7:56 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
chasemp triaged this task as Medium priority.Jul 5 2017, 7:06 PM
chasemp added a subscriber: chasemp.

@Tarrow for now we are in a holding pattern on kubernetes features we can support but I think the idea is solid when we start gaining more traction.

Legoktm renamed this task from Run non-interactive commands on labs kubernetes webservices to Run non-interactive commands on Toolforge kubernetes webservices.May 31 2018, 5:45 PM
Legoktm updated the task description. (Show Details)

Proof of concept using kubectl directly against the 2020 Kubernetes cluster

$ /usr/bin/kubectl run interactive \
  --image=docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest \
  --restart=Never --command=true --env=HOME=$HOME --labels='toolforge=tool' \
  --rm=true --stdin=true --tty=true \
  -- ls /etc
adduser.conf            gss            mke2fs.conf        resolv.conf
alternatives            gtk-3.0        modules-load.d     rmt
apt                     host.conf      motd               securetty
bash.bashrc             hostname       mtab               security
bash_completion.d       hosts          mysql              selinux
bindresvport.blacklist  ImageMagick-6  nanorc             shadow
binfmt.d                init.d         novaobserver.yaml  shadow-
ca-certificates         inputrc        nsswitch.conf      shells
ca-certificates.conf    issue          opt                skel
cron.daily              issue.net      os-release         ssl
dbus-1                  kernel         pam.conf           subgid
debconf.conf            ldap           pam.d              subuid
debian_version          ldap.conf      passwd             sysctl.d
default                 ldap.yaml      passwd-            systemd
deluser.conf            ld.so.cache    perl               terminfo
dhcp                    ld.so.conf     profile            timezone
dictionaries-common     ld.so.conf.d   profile.d          tmpfiles.d
dpkg                    libaudit.conf  python3            ucf.conf
emacs                   locale.alias   python3.7          update-motd.d
environment             locale.gen     rc0.d              vim
fonts                   localtime      rc1.d              wmcs-project
fstab                   login.defs     rc2.d              X11
gai.conf                logrotate.d    rc3.d              xattr.conf
group                   machine-id     rc4.d              xdg
group-                  mailcap        rc5.d
gshadow                 mailcap.order  rc6.d
gshadow-                mime.types     rcS.d
pod "interactive" deleted

Change 621776 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/software/tools-webservice@master] Make webservice shell scriptable

https://gerrit.wikimedia.org/r/621776

This comment was removed by bd808.

Change 621776 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/software/tools-webservice@master] Make webservice shell scriptable

https://gerrit.wikimedia.org/r/621776

This change works, but it is a bit less stable than the current method. When the interactive shell is left open for a long period of time (think hours, not minutes), kubectl can get disconnected from the running pod's container. When this happens, the pod is "leaked" in the tool's Kubernetes namespace and must be manually killed (kubectl delete pod ...). This could be confusing for folks who are not really aware of how webservice and kubectl interact yet as with a few leaks their namespace's quota for running new pods would be exhausted.

I'm not sure what balance to aim for is between making it possible to script webservice shell ... commands from the bastions (which the patch does) and making the webservice ... experience as intuitive as possible. @Tarrow, @Legoktm, @zhuyifei1999, @Bstorm: what are your thoughts?

Change 621776 merged by jenkins-bot:

[operations/software/tools-webservice@master] Make `webservice shell` scriptable

https://gerrit.wikimedia.org/r/621776

tools.ldap@tools-sgebastion-08:~$ webservice python3.9 shell -- python3 --version
Python 3.9.2
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f1d0e918ae8>
Traceback (most recent call last):
  File "/usr/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

The error appears to be coming from webservice itself.

tools.ldap@tools-sgebastion-08:~$ webservice python3.9 shell -- python3 --version
Python 3.9.2
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f1d0e918ae8>
Traceback (most recent call last):
  File "/usr/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

The error appears to be coming from webservice itself.

My current untested guess is that this error comes from the Popen.wait() and it may be very specifically a python 3.5 bug https://github.com/python/cpython/commit/9cd7e17640a49635d1c1f8c2989578a8fc2c1de6

Just noting here that I'm seeing this WeakValueDictionary as of late whenever I exit a webservice shell. It's not deterministic, but happens most times.

home$ ssh tools-login

krinkle at tools-sgebastion-07.tools.eqiad.wmflabs in ~$ become intuition

# kubernetes, php7.3
tools.intuition at tools-sgebastion-07.tools.eqiad.wmflabs in ~$ webservice shell

tools.intuition at shell-1641484967$ echo 1
1
# up arrow, ctrl-C, then type exit
tools.intuition at shell-1641484967$ echo 1^C
tools.intuition at shell-1641484967$ exit
logout
pod tool-intuition/shell-1641484841 terminated (Error)
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f911920fc80>
Traceback (most recent call last):
  File "/usr/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

Other times, it prints a line about a pod being terminated:

[16:03 UTC] tools.intuition at shell-1641484999 in ~
$ exit
logout
pod tool-intuition/shell-1641484999 terminated (Error)

[16:03 UTC] tools.intuition at tools-sgebastion-07.tools.eqiad.wmflabs in ~
38s 130 $

And rarely, it is quiet, it just exists quietly, which is what I'd expect other times:

tools.intuition at tools-sgebastion-07.tools.eqiad.wmflabs in ~$ webservice shell
$
tools.intuition at shell-1641485197 in ~$ exit
logout
tools.intuition at tools-sgebastion-07.tools.eqiad.wmflabs in ~$

Just noting here that I'm seeing this WeakValueDictionary as of late whenever I exit a webservice shell. It's not deterministic, but happens most times.

It is a python bug in the version of python shipped with Buster. It's harmless. See T169695#7466946 for details.