Page MenuHomePhabricator

Custom Kubernetes deployment fails from Stretch bastion
Closed, ResolvedPublic

Description

$ ssh tools-sgebastion-06.tools.eqiad.wmflabs
$ become jouncebot
$ kubectl create -v=4 -f jouncebot/etc/deployment.yaml
I0208 06:00:36.878422   14989 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server could not find the requested resource",
  "reason": "NotFound",
  "details": {
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "unknown"
      }
    ]
  },
  "code": 404
}]
F0208 06:00:36.878488   14989 helpers.go:119] Error from server (NotFound): the server could not find the requested resource

Works from Trusty bastion:

$ ssh tools-bastion-02.tools.eqiad.wmflabs
$ become jouncebot
$ kubectl create -v=4 -f jouncebot/etc/deployment.yaml
I0208 06:01:42.036653   32142 decoder.go:206] decoding stream as YAML
deployment "jouncebot.bot" created

Event Timeline

The deployment file that is failing is:

---
# Run jouncebot on kubernetes
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: jouncebot.bot
  namespace: jouncebot
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: jouncebot.bot
    spec:
      containers:
        - name: bot
          image: docker-registry.tools.wmflabs.org/toollabs-python-base:latest
          command: [ "/data/project/jouncebot/jouncebot/bin/jouncebot.sh", "run" ]
          workingDir: /data/project/jouncebot
          env:
            - name: HOME
              value: /data/project/jouncebot
          imagePullPolicy: Always
          volumeMounts:
            - name: home
              mountPath: /data/project/jouncebot/
      volumes:
        - name: home
          hostPath:
            path: /data/project/jouncebot/

kubectl version eschew is only supported for N-1/N+1 versions: https://kubernetes.io/docs/setup/version-skew-policy/#kubectl

$ curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.4.12/bin/linux/amd64/kubectl
$ chmod +x kubectl
$ ./kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.12", GitCommit:"19e81afecf5eb2b7838c35e2cbf776aff04dc34c", GitTreeState:"clean", BuildDate:"2017-04-20T21:01:06Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6+e569a27", GitCommit:"2b3537c7b3b111816176c910e52e3fd03598dd7b", GitTreeState:"dirty", BuildDate:"2017-03-31T21:33:05Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

$ ./kubectl create -f jouncebot/etc/deployment.yaml
Error from server: error when creating "jouncebot/etc/deployment.yaml": deployments.extensions "jouncebot.bot" already exists

$ ./kubectl create -v=4 -f jouncebot/etc/deployment.yaml
I0208 15:42:32.855076   31308 decoder.go:206] decoding stream as YAML
I0208 15:42:33.024880   31308 helpers.go:193] server response object: [{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "error when creating \"jouncebot/etc/deployment.yaml\": deployments.extensions \"jouncebot.bot\" already exists",
  "reason": "AlreadyExists",
  "details": {
    "name": "jouncebot.bot",
    "group": "extensions",
    "kind": "deployments"
  },
  "code": 409
}]
F0208 15:42:33.024926   31308 helpers.go:114] Error from server: error when creating "jouncebot/etc/deployment.yaml": deployments.extensions "jouncebot.bot" already exists

So, does it work with ancient kubectl? If so, why don't we just place a kubectl binary on the server somewhere? "Already exists" sounds like success to me over "not found", which sounds like "API and client cannot understand each other".

Overall, I like to think this is a temporary edge case where, if you are using kubectl directly as is, using kubectl-old instead is fine? I don't know how many people are actually doing this, I'm sad but not surprised we found something a more modern kubectl cannot do with that old server.

If it doesn't work with the old binary, then the probably will be tougher to solve because it probably is a weird interaction with flannel that isn't available on that bastion.

The great thing about go is that statically linked binaries just work if you dump them in a place.

It works.

tools.jouncebot@tools-sgebastion-06:~$ kubectl get deploy
NAME            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
jouncebot.bot   1         1         1            1           10h

tools.jouncebot@tools-sgebastion-06:~$ kubectl delete deploy jouncebot.bot
deployment.extensions "jouncebot.bot" deleted

tools.jouncebot@tools-sgebastion-06:~$ ./kubectl-1.4.12 create  -f jouncebot/etc/deployment.yaml
deployment "jouncebot.bot" created

So if we just do:

file { 'kubectl-1.4': 
    path => '/usr/local/bin/kubectl',
    ensure => 'file',
    mode => 'a+x',
    source => 'https://storage.googleapis.com/kubernetes-release/release/v1.4.12/bin/linux/amd64/kubectl',
}

Then this goes away until we upgrade the world. Plus, the docs and man stuff from kubernetes-client (which is just kubectl + docs and a useless config) are there for those who want it. @bd808, does it seem like I'm totally off here? Can I get away with this? I mean, it is technically a stopgap.

Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

We may want to verify the checksum as an extra measure:

$ sha256sum ./kubectl 
e0376698047be47f37f126fcc4724487dcc8edd2ffb993ae5885779786efb597

That would make me feel better about it. :)

file { 'kubectl-1.4': 
    path => '/usr/local/bin/kubectl',
    ensure => 'file',
    mode => '0555',
    source => 'https://storage.googleapis.com/kubernetes-release/release/v1.4.12/bin/linux/amd64/kubectl',
    checksum_value => 'e0376698047be47f37f126fcc4724487dcc8edd2ffb993ae5885779786efb597'
}

That appears to be valid for puppet 4.8 according to the docs (despite the fact that our linter needs arrow alignment).

The checksum definitely should help protect against upstream tampering by a malicious 3rd party. I think it would be good to get an opinion from @MoritzMuehlenhoff and/or @faidon on this "unique" short-term deployment strategy until we upgrade the Kubernetes cluster itself via T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster.

If this method does not pass muster we could cobble together a deb package that does the same thing and host it in Toolforge aptly repo.

Change 489291 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: Use a really old version of kubectl for the current k8s

https://gerrit.wikimedia.org/r/489291

Patch submitted for consideration as an adjunct to that.

Change 489291 merged by Bstorm:
[operations/puppet@production] toolforge: Use a really old version of kubectl for the current k8s

https://gerrit.wikimedia.org/r/489291

Change 491271 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] tools-bastion: need the checksum type

https://gerrit.wikimedia.org/r/491271

Change 491271 merged by Bstorm:
[operations/puppet@production] tools-bastion: need the checksum type

https://gerrit.wikimedia.org/r/491271

That version of kubectl is deployed in tandem with the OS version now.

Cool thing is that the default tools paths place it first. Which kubectl returns: /usr/local/bin/kubectl

Test when you get a chance :)

bd808 assigned this task to Bstorm.

Verified that this is working as hoped from tools-sgebastion-07.tools.eqiad.wmflabs with the jouncebot tool by stopping and starting its custom deployment.