Page MenuHomePhabricator

Staging k8s ci namespace limitranges
Closed, ResolvedPublic

Description

We run helm test as part of the release pipeline. We deploy using the charts from the https://releases.wikimedia.org/charts repo. When trying to merge blubber updates I keep getting the message:

Error: release blubber-acrkk5tz failed: timed out waiting for the condition

Which I have tracked down to an error from the replicaset-controller:

5m            5m              1       replicaset-controller                   Warning         FailedCreate    Error creating: pods "blubberoid-blubber-cyr4hcls-5c589f6f7d-9z4b6" is forbidden: maximum cpu usage per Container is 1, but limit is 1800m.

I also ran into the lower limit of a minimum of 0.1 cpu. I bumped the chart for the lower limit, but the upper limit seems more important to keep in place if we want to use the charts repository as a way for folks to deploy our internal applications.

I'm not sure how granular these limits are, i.e. can you raise/lower for a specific namespace?

Impact: It's currently preventing me from merging updates to blubber and others using helm test will likely be impacted as well.

Event Timeline

@thcipriani is granular per namespace, you can submit a CR with changed values anytime, i will bump those values and refer to this phab task so you can see how is done

Change 525789 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s: changing CI limits so actual charts can be tested

https://gerrit.wikimedia.org/r/525789

Change 525789 merged by Fsero:
[operations/deployment-charts@master] k8s: changing CI limits so actual charts can be tested

https://gerrit.wikimedia.org/r/525789

@thcipriani you can launch the pipeline again and it should work, however a better fix is to change limits in blubber default values in the chart, 1m is not realistic as a cpu minimum