Page MenuHomePhabricator

Deploy paws dev to codfw1dev
Closed, ResolvedPublic

Description

As part of a magnum cluster deploy start by deploying a, more persistent, cluster to codfw1dev. Test that having a single control node will survive a hypervisor drain as described as a potential limitation in https://phabricator.wikimedia.org/T326257#8500032

Related Objects

StatusSubtypeAssignedTask
Resolvedrook
Resolvedrook
ResolvedNone
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
ResolvedSDunlap
OpenNone
Resolvedrook
Resolvedrook
ResolvedBUG REPORTAndrew
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook
ResolvedNone
ResolvedNone
Resolvedrook
Resolvedrook
Resolvedrook
ResolvedAndrew
Resolvedrook
Resolvedrook
Resolvedrook
Resolvedrook

Event Timeline

root@cloudcontrol2001-dev:~# openstack project create --description 'paws-dev' paws-dev --domain default
+-------------+----------+
| Field       | Value    |
+-------------+----------+
| description | paws-dev |
| domain_id   | default  |
| enabled     | True     |
| id          | paws-dev |
| is_domain   | False    |
| name        | paws-dev |
| options     | {}       |
| parent_id   | default  |
| tags        | []       |
+-------------+----------+
root@cloudcontrol2001-dev:~# openstack role add --project paws-dev --user rook projectadmin
root@cloudcontrol2001-dev:~# openstack role add --project paws-dev --user rook user
openstack coe cluster template create paws-dev-k8s21 \
--image Fedora-CoreOS-34 \
--external-network wan-transport-codfw \
--fixed-subnet cloud-instances2-b-codfw \
--fixed-network lan-flat-cloudinstances2b \
--dns-nameserver 8.8.8.8 \
--network-driver flannel \
--docker-storage-driver overlay2 \
--docker-volume-size 30 \
--master-flavor g2.cores1.ram2.disk20 \
--flavor g2.cores1.ram2.disk20 \
--coe kubernetes \
--labels kube_tag=v1.21.8-rancher1-linux-amd64,hyperkube_prefix=docker.io/rancher/,cloud_provider_enabled=true
openstack quota set --gigabytes 150 paws-dev
openstack coe cluster create paws-dev --cluster-template paws-dev-k8s21 --master-count 1 --node-count 3 --floating-ip-disabled --keypair rookskey

Launch vm to be used for nfs and haproxy. Edit security groups to allow for 443 from anywhere, and anything from 172.16.0.0/12. Attach a floating IP. From that node...

apt install haproxy

append the following to /etc/haproxy/haproxy.cfg:

frontend k8s-ingress-http
    bind 0.0.0.0:80
    mode http
    default_backend k8s-ingress

frontend k8s-ingress-https
    bind 0.0.0.0:443 ssl crt /etc/acmecerts/paws/live/ec-prime256v1.chained.crt.key
    mode http
    default_backend k8s-ingress


backend k8s-ingress
    mode http
    option httplog
    option tcp-check
    balance roundrobin
    timeout server 1h
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server <worker ip> <worker ip>:30001 check

/etc/acmecerts/paws/live/ec-prime256v1.chained.crt.key contains the key from prod, perhaps we should generate one just for this.

systemctl restart haproxy

for nfs:

apt-get install nfs-kernel-server -y
press enter when prompted for which version thing to use.
mkdir -p /srv/misc/shared/paws/project/paws/userhomes/
mkdir -p /srv/dumps/xmldatadumps/public
chown -R nobody:nogroup /srv/
chmod 777 -R /srv/
echo '/srv  172.16.0.0/12(rw,sync,no_subtree_check)' >> /etc/exports
systemctl restart nfs-server

From the labs bastion (bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org) after setting up the kube config:

helm upgrade --install ingress-nginx ingress-nginx \
  --version v4.4.0 \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  --set controller.service.type=NodePort \
  --set controller.service.enableHttps=false \
  --set controller.service.nodePorts.http=30001

these steps could surely be nicer...T326417
find . -type f -exec sed -i -e 's/clouddumps100[21].wikimedia.org/haproxy-and-nfs.paws-dev.codfw1dev.wikimedia.cloud/g' {} \;
find . -type f -exec sed -i -e 's/nfs-tools-project.svc.eqiad.wmnet/haproxy-and-nfs.paws-dev.codfw1dev.wikimedia.cloud/g' {} \;
edit path names in values.yaml on lines 107 and 110 to be unique as well as mountPath on line 228
Delete the db entry in paws/secrets.yaml
Change 2000Mi to 20Mi on line 40 in paws/templates/public.yaml

kubectl config set-context --current --namespace=codfw1dev
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm dep up paws/
kubectl create namespace codfw1dev
helm install paws --namespace codfw1dev ./paws -f paws/production.yaml -f paws/secrets.yaml --timeout=50m
kubectl apply -f manifests/psp.yaml

In the /etc/hosts of your machine (assuming the floating ip was 185.15.57.22):
185.15.57.22 hub.paws.wmcloud.org
185.15.57.22 paws.wmcloud.org
185.15.57.22 paws.wmflabs.org
185.15.57.22 public.paws.wmcloud.org
185.15.57.22 paws-public.wmflabs.org

This seemed to work and survive a hypervisor drain on the hypervisor hosting the control node.