Page MenuHomePhabricator

Enable runners for projects under gitlab.wikimedia.org security group
Closed, ResolvedPublic2 Estimated Story Points

Description

Hello Release-Engineering-Team -

It looks like runners have been completely disabled for gitlab.wikimedia.org at this time?

Screen Shot 2021-10-21 at 1.31.41 PM.png (299×406 px, 39 KB)

Screen Shot 2021-10-21 at 1.32.21 PM.png (113×856 px, 20 KB)

I assume this might be related to the ongoing discussion at T291978 etc? Is there a plan to enable any variety of runners soon? If not, I'd like to request that perhaps 2 to 3 runners be enabled for pipelines under the security group within Gitlab so that we have an easy means to continue testing and developing our security ci templates this quarter and next. Thanks.

Event Timeline

sbassett renamed this task from Enable runners for Gitlab security space to Enable runners for projects under gitlab.wikimedia.org security group.Oct 21 2021, 6:50 PM
brennen changed the task status from Open to Stalled.Oct 21 2021, 10:19 PM
brennen set the point value for this task to 2.
brennen added a subscriber: dduvall.

This is more about T292094 (Limit GitLab shared runners to trusted contributors), though there's plenty of overlap.

@dduvall is currently rebuilding the runners to provision more space, but we can probably temporarily assign one of that pool to the security project group to unblock you all. On a very slightly longer timeline, I think we're likely to assign those runners all to a top-level group which we'll be moving all "officially hosted" projects to, including security.

@dduvall is currently rebuilding the runners to provision more space, but we can probably temporarily assign one of that pool to the security project group to unblock you all. On a very slightly longer timeline, I think we're likely to assign those runners all to a top-level group which we'll be moving all "officially hosted" projects to, including security.

This would be great, thanks.

Hey @brennen @dduvall - looks like we might have runners again? This one just passed for me, even though the underlying ci job is just a test echo for now. If this config is stable and we can rely on a couple of runners being available for a while, I guess this task can be resolved.

Not quite. We have runners reconfigured at the moment, trying to work out how best to limit a single one to a group.

Another short-term option would be to create a new VPS project and provision your own dedicated runners for the /security group, but it's a fair amount of hoop-jumping.

Change 734703 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/puppet@production] hiera: Add hostname based lookup to secret hierarchy under labs

https://gerrit.wikimedia.org/r/734703

Hey @brennen @dduvall - we were hoping to get this resolved a bit sooner than later, as it's going to impact the work for our current appsec pipeline sprint. If this is stalled or lower priority, would it be possible to set up our own runner on wmcs, at least temporarily? I don't really want to do that, but I suppose we can if that would be the fastest and most feasible path forward. Thanks.

Mentioned in SAL (#wikimedia-releng) [2021-10-29T21:28:48Z] <brennen> manually registering runner-1008-security as a stopgap measure for T294050

brennen changed the task status from Stalled to In Progress.Oct 29 2021, 9:32 PM

As indicated by log entry above, I've manually registered runner-1008-security for the security project group. I don't think this will collide with anything.

As indicated by log entry above, I've manually registered runner-1008-security for the security project group. I don't think this will collide with anything.

Definitely works for us right now. Thanks.

Change 734703 merged by Andrew Bogott:

[operations/puppet@production] hiera: Add hostname/certname based lookup to secret hierarchy under labs

https://gerrit.wikimedia.org/r/734703

Mentioned in SAL (#wikimedia-releng) [2021-11-01T21:47:25Z] <brennen> gitlab runner-1008: re-registering runner for security group using host-specific config (T294050)

brennen assigned this task to dduvall.
brennen updated Other Assignee, added: brennen.
brennen moved this task from Next to Done or Declined on the User-brennen board.
brennen changed the point value for this task from 2 to 4.
brennen changed the point value for this task from 4 to 2.
brennen set Final Story Points to 4.

Hey @brennen et al-

Did this get reverted? gitlab.wikimedia.org is telling me that our projects don't have any active runners.

Screen Shot 2021-11-22 at 3.56.37 PM.png (124×576 px, 27 KB)

What project is this for? Everything under /repos/security should have access to all of the group runners allocated to /repos. We don't currently have shared runners for the instance as a whole.

What project is this for? Everything under /repos/security should have access to all of the group runners allocated to /repos. We don't currently have shared runners for the instance as a whole.

Ok, this appears to have been a false alarm for now. Sorry about that.

sbassett added a subscriber: Jelto.

Hey @brennen (and maybe @Jelto?) -

This seems to be happening again, sadly. After receiving the previous "This job is stuck..." notice, I looked at the CI config for the Secteam BoilerPlate Fork - PHP JS project here: https://gitlab.wikimedia.org/repos/security/secteam-boilerplate-fork/-/settings/ci_cd#js-runners-settings. It looks like there are available group runners, but that they're all currently offline? Not sure if this is perhaps due to the recent security upgrade T297183 or work related to T295481?

@sbassett I seems all runners were offline. During work on T295481 I produced a invalid GitLab runner configuration which propagated to the WMCS runners as well. I fixed the configuration and re-registered the runners (force puppet run). All runners are online again. https://gitlab.wikimedia.org/repos/security/secteam-boilerplate-fork/-/settings/ci_cd#js-runners-settings shows some runners as well.

@Jelto - ah, ok, thanks. Yes, things look good now. I guess it's also good to know that an invalid runner config has the potential to DoS available runners :)