Page MenuHomePhabricator

PAWS: figure out paws-public in the new service model
Closed, ResolvedPublic

Description

This task is to track the work specifically to figure out paws-public in the modern service model we are introducing for PAWS.

Previous to the new model, it worked like this:

paws ingress-old.png (896×2 px, 161 KB)

Some comments:

  • the paws-public toolforge webservice is creating an ingress with a domain that should be rejected by our custom ingress admission controller, how is that even working?:
tools.paws-public@tools-sgebastion-07:~$ kubectl get ingress
NAME                 HOSTS                     ADDRESS   PORTS   AGE
paws-public-custom   paws-public.wmflabs.org             80      110d
paws-public-legacy   tools.wmflabs.org                   80      110d
  • I believe @Bstorm suggested we could simply run nbserve and the renderer in the paws k8s cluster, so we effectively decouple from Toolforge
  • in such case, we could introduce a new domain public.paws.wmcloud.org and a legacy redirection from paws-public.wmflabs.org.
  • This is required for T160113: Move PAWS nfs onto its own share and T167086 to work as expected if we want paws-public to continue to be updated by user activity. Otherwise it would become an archive.
  • The image(s) should be added to the quay.io repo in the end of this so that they are maintainable by all PAWS admins instead of Toolforge admins only.

Event Timeline

I've been trying to make sense of the current setup for paws-public, specifically the nbserve part https://github.com/toolforge/paws/blob/master/images/nbserve/nginx.py

The embedded LUA code creates an URL redirection to map the User:whatever string to the wiki user id, example:

But this redirection doesn't happens always, not sure if by mistake or on purpose.

See different behavior if visiting:

We are considering running this nbserve server in the PAWS k8s cluster, and it would be nice if this can be handled by nginx-ingress. But I'm not 100% sure the ConfigMap allows embedding LUA code.
Anyway, it would be best if we don't need the redirection at all.

aborrero triaged this task as Medium priority.Jun 22 2020, 4:59 PM
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

As for the ingress controller, that is allowed by the code IIRC even though it is an abnormal use case for it. I believe you can use <name>.wmflabs.org even if that is intended for <name>.toolforge.org. I can also safelist namespaces (and have for at least tool-fourohfour and possibly this one...don't recall, but it is in the controller's object definition).

@Chicocvenancio suggested in IRC that is was because of the regex at https://github.com/toolforge/paws/blob/dee5bec8275af402bfe9ecf965361c850b215241/images/nbserve/nginx.py#L119

I think it ought to ignore that ending / because it would be part of the second capture group in the regex, which would allow it all to work. I think the existing redirect in the Toolforge ingress layer is somehow intercepting and causing some weird behavior when there is a trailing /. Overall, people usually reference a file, which is what really matters here, and that works...so I'm not at all worried that it doesn't work with a trailing slash, basically (though it would be nice).

Adding this all into the ingress would be great in the PAWS cluster, I think, if we could get away with it.

For browser sandboxing and cookie handling, I don't like using paws.wmcloud.org and <anything>.paws.wmcloud.org at the same time for content. I would suggest something like:

  • paws.wmcloud.org -> 302 redirect to "landing page" which I think today would be https://wikitech.wikimedia.org/wiki/PAWS
  • hub.paws.wmcloud.org -> The actual jupyter hub ingress/pod
  • public.paws.wmcloud.org -> The nbserve ingress/pod

The problem with mixing different levels of hostname is that the same-origin policy allows javascript served by A.example.com to say "my effective origin is really example.com" by using a document.domain declaration. This then allows AJAX requests from A.example.com to example.com without a CORS check. That is bad in this case as it subverts the intention of sandbox isolation for being authenticated to the Hub without fearing attacks from the public content published by other users.

If/when this is done, I'll need another oauth grant, if we go with hub.paws.wmcloud.org. Just a note. Definitely doable.

A really awesome additional isolation would be to figure out how to have per-user subdomains for the public content too (for example bd808.public.paws.wmcloud.org), but the mappings needed will be tricky as the users are Wikimedia SUL accounts with names like BDavis (WMF) and not domain safe strings like we have in the shell names associated with Developer accounts. If we don't mind less user readable URLs, the per-user hostname scheme could be <gu_id>.public.paws.wmcloud.org (for example 28456187.public.paws.wmcloud.org). I'm not completely up to speed on all the transforms from the Hub url space to the Public url space though so this may be non-trivial to implement in any sense.

A really awesome additional isolation would be to figure out how to have per-user subdomains for the public content too (for example bd808.public.paws.wmcloud.org), but the mappings needed will be tricky as the users are Wikimedia SUL accounts with names like BDavis (WMF) and not domain safe strings like we have in the shell names associated with Developer accounts. If we don't mind less user readable URLs, the per-user hostname scheme could be <gu_id>.public.paws.wmcloud.org (for example 28456187.public.paws.wmcloud.org). I'm not completely up to speed on all the transforms from the Hub url space to the Public url space though so this may be non-trivial to implement in any sense.

hub to public URL space is purely a path-based matter on the NFS with no webserver or Kubernetes component. We share the NFS as a static website as paws-public after feeding it through some renderers for things like Ipython notebooks and HTML. This is possible through redirects and ingress scripting, but I'm not sure I want to go near it in terms of ingress complexity when otherwise paws is a potentially simple ingress structure. Since we have existing docker images that work with future work slated for them, and there is enough being changed to wrangle this on this round, I don't want to dive very deeply into paws-public at all until later quarters. It's ultimately just a dodge around needing authentication to be able to view the notebooks.

Note that besides TLS and the oauth grant, we need to change the hostname here: https://github.com/crookedstorm/paws/blob/master/paws/values.yaml#L170
If we go with hub.paws.wmcloud.org. It's pretty straightforward.

Mentioned in SAL (#wikimedia-cloud) [2020-06-24T23:18:03Z] <bstorm> added A record for *.paws.wmcloud.org to public and hub to use T211096 T255997 T195217

So I can now spin up a simple ingress object and the deployment for paws-public in the new cluster and it should Just Work ™

An initial rsync of the NFS is over, so it really should even show the content.

This was 15 min of work, so I'm not strongly committed to any of the methods here, but I think I can show proof of concept of the above with https://github.com/crookedstorm/paws/commit/395063ba5410f0623cb7df782c2ad6d79a3cfb30

Mentioned in SAL (#wikimedia-cloud) [2020-06-25T15:44:23Z] <bstorm> deployed a proof-of-concept paws-public setup in the new cluster T255997

Now it works with less repetition:
https://public.paws.wmcloud.org/User:BDavis%20(WMF)/

I've stripped out the paws-public bit of the path.

If you are ok with it, I can cut copies of the images to put in the paws quay.io repo and make those the defaults in our fork.
After that, we could call this piece done.

Ok, it now works with a redirect for paws-public.wmflabs.org/paws-public (or will as soon as the DNS is updated). I proved this using the httpie tool. The working ingress is committed to the fork. It is currently a 302 redirect, but we can change it to a 308 easily if we prefer.

bstormWMF2026:~ $ http --follow  paws.wmcloud.org/paws-public/User\:BDavis_\(WMF\)/ascii-cuthulu.txt  Host:paws-public.wmflabs.org
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Security-Policy-Report-Only: default-src 'self' 'unsafe-eval' 'unsafe-inline' blob: data: filesystem: mediastream: *.toolforge.org wikibooks.org *.wikibooks.org wikidata.org *.wikidata.org wikimedia.org *.wikimedia.org wikinews.org *.wikinews.org wikipedia.org *.wikipedia.org wikiquote.org *.wikiquote.org wikisource.org *.wikisource.org wikiversity.org *.wikiversity.org wikivoyage.org *.wikivoyage.org wiktionary.org *.wiktionary.org *.wmcloud.org *.wmflabs.org wikimediafoundation.org mediawiki.org *.mediawiki.org wss://tools.wmflabs.org; report-uri https://csp-report.toolforge.org/collect;
Content-Type: text/plain; charset=UTF-8
Date: Tue, 30 Jun 2020 22:28:29 GMT
ETag: W/"5809418c-c0"
Last-Modified: Thu, 20 Oct 2016 22:13:32 GMT
Server: nginx/1.13.9
Strict-Transport-Security: max-age=2592000
Transfer-Encoding: chunked
X-Clacks-Overhead: GNU Terry Pratchett

________
/         \
|          \
|           \
|            \
\        __\/_\
 \       \_/\_/\___
  \/  / |  \ | \   \
 _/ _/ _|  | \  \  |_
/  |  / /  / /  |  \ \_
| /  /  | |  \   \  \  \

Mentioned in SAL (#wikimedia-cloud) [2020-06-30T23:00:53Z] <bstorm> added paws-public.wmflabs.org to the alt-names for acme-chief, which broke it until we hand off the zone to the paws project <sorry!> T195217 T255997

The cert is fixed now and includes paws-public.wmflabs.org as an alt-name.
Now cutover will just mean changing the A record after final rsync.

It seems we are finally using public.paws.wmcloud.org. Also, it works with both the user id and the actual name in the URL, which is cool and a symptom the old weird nginx/lua trick is working fine in the new cluster.
Some random URLs I checked:

The ingress config itself:

root@paws-k8s-control-1:~# kubectl get ingress -n prod paws-public-custom -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    meta.helm.sh/release-name: paws
    meta.helm.sh/release-namespace: prod
  creationTimestamp: "2020-06-25T15:43:09Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    ingress.paws.wmcloud.org: public
    name: paws-public-custom
  name: paws-public-custom
  namespace: prod
  resourceVersion: "7819138"
  selfLink: /apis/extensions/v1beta1/namespaces/prod/ingresses/paws-public-custom
  uid: af8b9403-4403-49a5-a986-7aeac18668e8
spec:
  rules:
  - host: public.paws.wmcloud.org
    http:
      paths:
      - backend:
          serviceName: paws-public
          servicePort: 8000
status:
  loadBalancer: {}

This seems to be stored correctly in the repo: https://github.com/crookedstorm/paws/blob/master/paws/templates/public.yaml
I don't understand the '-custom' suffix in the metadata, feels a bit misleading at first, but that's a minor thing I guess?

root@paws-k8s-control-1:~# kubectl get ingress -n prod paws-public-legacy -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    meta.helm.sh/release-name: paws
    meta.helm.sh/release-namespace: prod
    nginx.ingress.kubernetes.io/temporal-redirect: $scheme://public.paws.wmcloud.org/$2
  creationTimestamp: "2020-06-30T22:16:16Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    ingress.paws.wmcloud.org: legacy
    name: paws-public-legacy
  name: paws-public-legacy
  namespace: prod
  resourceVersion: "7819103"
  selfLink: /apis/extensions/v1beta1/namespaces/prod/ingresses/paws-public-legacy
  uid: 62eb8d52-ed7d-4789-a71f-b826130accb8
spec:
  rules:
  - host: paws-public.wmflabs.org
    http:
      paths:
      - backend:
          serviceName: paws-public
          servicePort: 8000
        path: /paws-public(/|$)(.*)
status:
  loadBalancer: {}

This seems to already be in https://github.com/crookedstorm/paws/blob/master/paws/templates/legacy.yaml

We managed to integrate paws-public in the general ingress setup:

image.png (619×1 px, 80 KB)

So I think the paws public thing is actually figured out at this point. It looks very elegant overall, but I guess we have some margin for improvement in future iterations, specially the old weird nginx/lua trick.
I'm closing this task now, feel free to reopen if required.