See archived Slack thread at https://wikimedia.slack.com/archives/C05H0JYT85V/p1692975749455829 for discussion but the TL;DR is that we can't simply add a separate [[runners]] section since tags is seemingly not configurable per runner (only per registration). We can, however, parametrize workload/requests/limits and perform two separate runner deployments.
Description
Details
Title | Reference | Author | Source Branch | Dest Branch | |
---|---|---|---|---|---|
Use memory-optimized runners on .gitlab-ci.yml | repos/data-engineering/dumps/mediawiki-content-dump!9 | xcollazo | test-memory-optimized-runners | main | |
gitlab: Parameterize node toleration and selector | repos/releng/gitlab-cloud-runner!252 | dduvall | review/parameterize-runner-workload | main | |
gitlab: Revert resource name changes | repos/releng/gitlab-cloud-runner!251 | dduvall | review/revert-gitlab-resource-name-changes | main | |
gitlab-runner: Create memory optimized runners | repos/releng/gitlab-cloud-runner!250 | dduvall | review/memory-optimized-runner-pool | main |
Related Objects
Event Timeline
dduvall opened https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/250
gitlab-runner: Create memory optimized runners
dduvall merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/250
gitlab-runner: Create memory optimized runners
dduvall opened https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/251
gitlab: Revert resource name changes
dduvall merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/251
gitlab: Revert resource name changes
@xcollazo the memory optimized runner pool has been deployed. You can schedule jobs to this pool by using the tag memory-optimized, either at the pipeline level under defaults or at the job level.
Scratch that. I just realized that I overlooked the node tolerations in the runner config. I'll have to do a follow up. Sorry.
dduvall opened https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/252
gitlab: Parameterize node toleration and selector
dduvall merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/252
gitlab: Parameterize node toleration and selector
@xcollazo alright, we should be good now. Test out the memory-optimized tag and let us know the results!
Some testing from my side:
My pytest job ran successfully: https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/jobs/137133
My somewhat heavier publish_conda_env release job seems to have stalled though: https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/jobs/137134
It was already publishing the artifact at 11 minutes in, but then it got at warning, presumably from k8s?:
WARNING: Getting job pod status pods "runner-emdqpmey-project-1420-concurrent-02lhsm" is forbidden: User "system:serviceaccount:gitlab-runner:gitlab-runner-memopt" cannot get resource "pods" in API group "" in the namespace "gitlab-runner"
And even though these new pods have a timeout of 20m, the job is still running at 26m+ as of this comment, but seems stalled? This job typically takes ~10 minutes.
(Manually canceled https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/jobs/137134 after ~54 minutes )
That's a strange error indeed and definition k8s related. It's very possible I was redeploying GitLab runners during that time to get some important config changes out. Could you re-run the job and see what happens?
Ok, it seems like we had a transient issue indeed when I reproed a couple days ago.
Both my jobs were successful this time:
https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/jobs/139559
and
https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/jobs/139560
Thus, confirming that the new memory-optimized containers solve my issue.
Also, should we document at https://wikitech.wikimedia.org/wiki/GitLab/Gitlab_Runner ?
xcollazo updated https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/9
Use memory-optimized runners on .gitlab-ci.yml
xcollazo merged https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/9
Use memory-optimized runners on .gitlab-ci.yml
@xcollazo: Checking in. Are your CI issues resolved with the use of the memory-optimized runner?
I went ahead and added some documentation at https://wikitech.wikimedia.org/wiki/GitLab/Gitlab_Runner#GitLab_Runner_types