Page MenuHomePhabricator

Request increased quota for qrank Toolforge tool
Closed, ResolvedPublic

Description

Tool Name: qrank
Type of quota increase requested: CPU

Reason: QRank processes pageviews and Wikimedia dumps to compute a ranking of Wikidata entities. For a quick intro, see README; for details and background, see Technical Design Document. The build pipeline is written in a compiled language (Go) and has been optimized for multi-core machines. When running the build pipeline on Digital Ocean, it finishes within a few hours. On Toolforge/Kubernetes, however, the same task currently takes almost three days. Partially this is due to NFS throttling, but according to my profiling, CPU seems to currently be the bigger bottleneck than read throughput.

Amount of quota requested: 8 CPUs would be ideal; only one pod will be needed. But if that’s too much, the tool can also live with less resources; it will adapt to whatever is available. On Digital Ocean, I’ve used 2 GiB of RAM per CPU core, but the system can also work with less memory if necessary. (The system heavily uses external sorting), making use of temporary files when running out of RAM). In the worst case, I can also live with the current default quota. But then, the freshness of the rankings will suffer.

Event Timeline

aborrero claimed this task.
aborrero moved this task from Inbox to Approved on the Toolforge (Quota-requests) board.
aborrero subscribed.

Approved. Let us know if that makes any difference.

aborrero@tools-k8s-control-3:~$ sudo -i kubectl describe ResourceQuotas -n tool-qrank
Name:                   tool-qrank
Namespace:              tool-qrank
Resource                Used  Hard
--------                ----  ----
configmaps              1     10
limits.cpu              0     2 <-----
limits.memory           0     8Gi
persistentvolumeclaims  0     3
pods                    0     4
replicationcontrollers  0     1
requests.cpu            0     2
requests.memory         0     6Gi
secrets                 1     10
services                0     1
services.nodeports      0     0
aborrero@tools-k8s-control-3:~$ sudo -i kubectl edit resourcequota tool-qrank --namespace tool-qrank
resourcequota/tool-qrank edited
aborrero@tools-k8s-control-3:~$ sudo -i kubectl describe ResourceQuotas -n tool-qrank
Name:                   tool-qrank
Namespace:              tool-qrank
Resource                Used  Hard
--------                ----  ----
configmaps              1     10
limits.cpu              0     8  <-----
limits.memory           0     8Gi
persistentvolumeclaims  0     3
pods                    0     4
replicationcontrollers  0     1
requests.cpu            0     2
requests.memory         0     6Gi
secrets                 1     10
services                0     1
services.nodeports      0     0
aborrero added a subscriber: Bstorm.

Hey, I've been talking with the WMCS team about this request. It turns out @Bstorm had some concerns about this quota bump not solving any actual performance problem, because a single container would still use just 1 CPU.

Also, you mention building the go program on Toolforge. How do you build it? I guess you build it in the toolforge bastion?

@Sascha:
Looking at https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&refresh=5m&var-namespace=tool-qrank you seem to max out your CPU usage per container (default limit is 1 CPU). So I think you probably did need a namespace CPU quota bump, I suspect that you really need a limitrange increase as well so each container can consume more. It might not be a bad idea to increase it slowly like starting with 2 since you can only consume 8 total for your entire tool right now (so if you have 2 jobs running in parallel, that's four).

All that said, before we make any changes:

  1. Thanks for using the guaranteed QoS pattern for your jobs. By requesting the same amount as you expect to use, that means the scheduler will not conflict with other jobs. Just please make sure you actually need all that CPU.
  2. Can we start with a limitrange of 2 CPU and see how that works? Our exec nodes don't have the kind of memory you are asking for, period, if you have 8, CPU anyway. Our largest exec nodes have no more than 8 CPU total, so it is a big ask for us. Many nodes only have 4.
  3. This demonstrated for me that, currently, the default RBAC we use doesn't give users the ability to run kubectl top pods against their own pods to see what they are actually using (which would be extremely helpful here). I'll go see if I can fix that. Until then, we do have the mostly reliable but slow https://k8s-status.toolforge.org that shows that information about nodes, but doesn't seem to show the pods' real usage in that.
  4. It doesn't look like you are compiling rather that your builder is building information for the server? Is that true? If you are compiling, there's not much we can do to optimize that because this isn't set up as a CI/CD right now. It's better to compile elsewhere or live with slowness.

Change 673080 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] rbac: add the ability for tools to run "kubectl top pods"

https://gerrit.wikimedia.org/r/673080

Change 673080 merged by jenkins-bot:
[labs/tools/maintain-kubeusers@master] rbac: add the ability for tools to run "kubectl top pods"

https://gerrit.wikimedia.org/r/673080

Ok NOW you can run kubectl top pods to find out what your pods are specifically using of ram and cpu, typically it will be expressed in milliCPUs. If you are using the pattern of requesting the same amount as the limit in your resources definition, that might trick it into just saying you are using all of it, so if you want to experiment, you might want to do that part differently.

Thank you! Yes, this is a build pipeline for data, it isn’t compiling code. For background, see the technical design document. (Feedback very welcome!)

The qrank-builder pipeline runs inside a single container, so my quota request is indeed about CPU limitrange per pod, not total CPU across the cluster. Apologies for not making this clearer in my ticket. The job is designed to keep all available cores busy, so it’s actually expected to max out on CPU. However, currently the job gets only one single CPU core to run its worker threads, which makes the pipeline run a little slow. On Toolforge, it needs several days to finish; compared to a few hours when running the same job on the same data in the DigitalOcean cloud. So, if the job was given more CPU cores, it could finish faster. Then, the computed rankings could be kept more fresh. However, in the grand scheme of things it’s certainly not a problem when some obscure ranking signal is slightly stale. If the Wikimedia Cloud is running out of CPU cores, those precious few cores would better be used for other things. (Having worked with some other datacenter infrastructure in the past, I was assuming that most jobs in the cluster would be memory-hungry, whereas CPU cores would generally be plentiful. Interesting that the workload on Wikimedia’s cloud is different).

Thank you for allowing users to run kubectl top pods, this is super helpful. Currently, with this Kubernetes configuration, I’m seeing 1003 mCPU and 1117 MiBytes. Perhaps you could increase the per-pod CPU quota to 2 or 3 cores? Fractional cores (eg. 2500 milli-cores) would be fine too. It will still max out on CPU as intended, but the overall pipeline would then hopefully finish within less than a day. But as said, if you’re running out of cores, the runtime of this pipeline really isn’t all that important.

Also, many thanks for pointing me to the resources dashboard. Curious, does “Network I/O” include NFS traffic? According to the dashboard, there’s been practically no networking activity for the job; but in reality, it’s reading perhaps 100 GB over the network from the mounted /public/dumps directory (over the lifetime of the job, of course not per second), and it’s reading and writing a couple gigabytes into its home directory. (The job also reads and writes quite a lot in /tmp, but I’m assuming this won’t go over the network).

Also, you mention building the go program on Toolforge. How do you build it? I guess you build it in the toolforge bastion?

No, currently I cross-compile the two binaries (builder and webserver) on my laptop, and then copy them to the bastion with scp. Of course, this is terribly awful release engineering practice. Does Wikimedia’s deployment pipeline also support Toolforge, or only CloudVPS? (Sorry for asking such an off-topic question on this ticket, but I wouldn’t know where else to ask).

Also, you mention building the go program on Toolforge. How do you build it? I guess you build it in the toolforge bastion?

No, currently I cross-compile the two binaries (builder and webserver) on my laptop, and then copy them to the bastion with scp.

Ok, thanks for the clarification!

Of course, this is terribly awful release engineering practice. Does Wikimedia’s deployment pipeline also support Toolforge, or only CloudVPS? (Sorry for asking such an off-topic question on this ticket, but I wouldn’t know where else to ask).

No it doesn't to the extend I know. But perhaps only because nobody asked before :-)

Thanks, @aborrero! I filed a separate ticket T277808 about deployment since it’s a bit off-topic from the CPU quota.

@Sascha
Network IO reported in grafana is defined as: sum(rate(container_network_receive_bytes_total{container_label_io_kubernetes_pod_namespace="$namespace"}[5m]))*8 from prometheus. That means it's just traffic over the pod network. Anything seen as happening "inside" the container, like NFS, would not count as traffic. There are values there for web traffic on other tools.

I've kicked your limitrange up to 2500m for CPU (basically 2.5). I left the RAM per container at 4Gi.

root@tools-k8s-control-1:~# kubectl get limitrange -n tool-qrank tool-qrank -o yaml
apiVersion: v1
kind: LimitRange
metadata:
  creationTimestamp: "2021-02-17T12:33:34Z"
  name: tool-qrank
  namespace: tool-qrank
  resourceVersion: "314183372"
  selfLink: /api/v1/namespaces/tool-qrank/limitranges/tool-qrank
  uid: 390fcc6a-9aa5-472a-802e-1bf8ea709ffa
spec:
  limits:
  - default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 150m
      memory: 256Mi
    max:
      cpu: 2500m
      memory: 4Gi
    min:
      cpu: 50m
      memory: 100Mi
    type: Container

I'll close this for now. I hope that will get things looking a bit better for you.