Maniphest T192217

Remove the "check" pipeline and Zuul's user-filter
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EddieGP
	Apr 14 2018, 8:09 PM

Description

The differentiation in check/test pipeline proves to be another barrier new developers are facing. It is/was necessary because we run/ran tests on permanent instances and didn't want anyone to be able to execute just any code there. This isn't a concern for one-off instances that are destroyed after each test was finished. As I understand it getting rid of it this differentiation has been a goal for quite a while and there's no doubt we want to get there.

This goal was described and tracked in T47499, which got closed because the way chosen to reach this goal was Nodepool. The job of tracking the effort to get everything into one-off instances (Nodepool) was then taken over by Continuous-Integration-Scaling and it's tasks (e.g. T119138, T134381, T119140) . These were declined and the project was archived, because nodepool is now considered legacy and the plan is to have docker for everything. There is T190097: Migrate all CI jobs from Nodepool, deprecate its use, but I fail to find a project or task "Migrate all CI jobs from permanent instances".

I wonder, but cannot easily answer the questions which jobs (or whether any at all) we still run on permanent instances, if the check/test differentiation is still needed, and if it is, if we're seeing progress in a 'permanent instances -> docker' transition, and how long the whitelist will continue to be necessary.

Details

	Subject	Repo	Branch	Lines +/-
	zuul: remove "check" pipeline	integration/config	master	+7 -463

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
					Restricted Task
		Resolved		hashar	T192217 Remove the "check" pipeline and Zuul's user-filter

Event Timeline

EddieGP created this task.Apr 14 2018, 8:09 PM

EddieGP updated the task description. (Show Details)

Krinkle subscribed.Apr 14 2018, 9:52 PM

Krinkle updated the task description. (Show Details)Apr 14 2018, 10:08 PM

Krinkle awarded a token.

Peachey88 subscribed.Apr 15 2018, 12:30 AM

MarcoAurelio subscribed.Apr 15 2018, 9:49 AM

EddieGP mentioned this in T192532: Figure out a way to enable volunteers to use the puppet compiler.May 2 2018, 5:49 PM

MGChecker subscribed.Aug 30 2018, 10:24 PM

Krinkle renamed this task from Get rid of the check pipeline to Remove the "check" pipeline and Zuul's user-filter.Aug 30 2018, 11:33 PM

Krinkle added a project: Release-Engineering-Team.

Krenair subscribed.Aug 31 2018, 12:48 AM

hashar mentioned this in T179963: Workspaces for mwgate-php55lint / mwgate-php70lint are getting huge.Sep 24 2018, 5:43 PM

hashar mentioned this in T208841: git-changed-in-head breaks with paths with spaces in.Nov 6 2018, 1:52 PM

hashar mentioned this in T207685: Run lint in check pipeline for Urbanecm's tools.Nov 12 2018, 10:28 AM

The check pipeline is used for php -l, composer validate and the commit message validator. I think we can drop support for them.

a Code-Review +1 by a whitelisted person is enough to run the tests
one less Zuul pipeline to manage, the check one has a list of email that duplicate the one from the test pipeline
let us drop jobs that are still running on permanent slaves

Overall that will reduce the complexity of the CI configuration not much being lost beside the few trivial checks currently done.

From the Grafana board Ihave just created: https://grafana.wikimedia.org/dashboard/db/zuul-pipeline The check pipeline had 55 builds over the last 7 days so that is hardly a loss.

Some notes from IRC (lightly edited):

[12:59:41] <James_F> How far away are we from running the CI jobs without a whitelist?
[13:00:20] <James_F> I noted the nodepool task got resolved, yay. ;-) I vaguely recall moving to docker being a necessary but not sufficient condition/
[13:01:21] <greg-g> nodepool was actually the first attempt at removing the whitelist....
[13:01:28] <thcipriani> James_F: hash<tab> mentioned this yesterday. With nodepool gone we still have to phase out a couple physical hosts that are still lingering, but I don't know the steps past that one.
[13:01:50] <James_F> greg-g: Indeed.
[13:02:02] -*- James_F nods.
[13:02:31] <James_F> Is there a task for "run CI for all"?
[13:03:01] <legoktm> I think someone needs to look at the threat models, and what exactly does "arbitrary code access in a docker instance runing on a cloud vps instance" let you escalate to
[13:03:34] <James_F> Ah, yes, T192217.
[13:03:34] <stashbot> T192217: Remove the "check" pipeline and Zuul's user-filter - https://phabricator.wikimedia.org/T192217
[13:03:37] <legoktm> and then realize that it doesn't matter since we already let basically everyone do so 
[13:04:46] <legoktm> maybe a bit of preventative stuff like closing outbound mail ports and other network firewalling

Jdforrester-WMF subscribed.Nov 16 2018, 9:07 PM

[13:03:01] <legoktm> I think someone needs to look at the threat models, and what exactly does "arbitrary code access in a docker instance runing on a cloud vps instance" let you escalate to

FWIW, I'm not particularly worried about this one - I think we can drop the whitelist easily once everything runs in docker. The threat is with those jobs that don't run in docker (yet).

Re-asking the question from the task description: Which/How many jobs (any at all?) do we still run on permanent instances?

We still have some jobs running on permanent slaves, they will be moved to Docker containers and I have created a subproject for that migration: Continuous-Integration-Infrastructure (Slipway) I wrote a test to report them on https://integration.wikimedia.org/ci/job/integration-config-qa/lastCompletedBuild/testReport/test_integration/IntegrationTests/test_jenkins_jobs_assigned_nodes/

The jobs in check pipeline are mostly for php lint which is superseded by parallel-lint and composer-validate which is already run by the composer test job. The usage per pipeline is reported on https://grafana.wikimedia.org/dashboard/db/zuul-pipeline , over 30 days:

Pipeline	Jobs
check	280
test	12447
gate-and-submit	9700

The whitelist must be kept for now. I am not comfortable in running code in Docker on WMCS, we would have to harness them a little bit more. Specially reduce possible network access.

greg triaged this task as Medium priority.Nov 28 2018, 5:42 AM

greg edited projects, added Release-Engineering-Team (Backlog); removed Release-Engineering-Team.

Change 493188 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] zuul: remove "check" pipeline

https://gerrit.wikimedia.org/r/493188

gerritbot added a project: Patch-For-Review.Feb 27 2019, 10:51 AM

Paladox subscribed.Feb 27 2019, 1:11 PM

hashar mentioned this in T218361: integration-slave-jessie-1004 puppet error.Mar 14 2019, 9:59 PM

Change 493188 merged by jenkins-bot:
[integration/config@master] zuul: remove "check" pipeline

https://gerrit.wikimedia.org/r/493188

Mentioned in SAL (#wikimedia-releng) [2019-03-19T14:01:13Z] <hashar> Removed Zuul "check" pipeline | https://gerrit.wikimedia.org/r/#/c/integration/config/+/493188/ | T192217

The check pipeline is gone!

Nitpicky comment: the description "Jobs restricted to trusted users. Will vote +2." for the test pipeline should be changed.

In T192217#5035989, @Daimona wrote:

Nitpicky comment: the description "Jobs restricted to trusted users. Will vote +2." for the test pipeline should be changed.

I have idea for new description: "Jobs triggered per default after upload of new patch set. Will vote +2."

In T192217#5035980, @hashar wrote:

The check pipeline is gone!

Thanks! It was my mistake to name this task "Get rid of the check pipeline", which was then renamed to the current title. The goal actually is to remove the user filter and let everyone run all tests. I expected both "remove whitelist" and "remove check pipeline" to happen at the same time (when test would've been moved over from permanent slaves to docker), but it turns out these are two distinct things. We should either re-open this task or create another one, as the goal to get completely rid of the whitelist is still relevant (probably "someday", though).

hashar added a parent task: Restricted Task.Mar 19 2019, 7:13 PM

@EddieGP I wanted to remove the check pipeline, notably because most of those jobs were legacy anyway and have since been replaced by better alternative. An example is untrusted developers being able to run the linters/style checker via composer test and/or npm test.

As for running tests for anyone, yes that is planned, there are multiple discussions / project going on inside Release-Engineering-Team . But in short the idea is to overhaul the CI infrastructure, switch to Docker containers running in Kubernetes and have the run isolated from the rest of the infra/CI jobs. Then eventually, maybe we will open CI run to any patch regardless of the author.

So that is definitely in our minds, even though it might not be captured in a proper task :]

It is expected that in the next iteration of the CI infrastructure ("Seakeaper"), that at least by default any Gerrit account will be permitted to have its code execute and tests run against it. More info at T243436 and related tasks and documents.

Maintenance_bot removed a project: Patch-For-Review.Feb 11 2020, 2:10 AM

Remove the "check" pipeline and Zuul's user-filterClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Remove the "check" pipeline and Zuul's user-filter
Closed, ResolvedPublic
Actions

Related Objects
Search...