Page MenuHomePhabricator

Stand up Zuul 11 experiment environment in zuul3 cloud VPS project
Closed, ResolvedPublic

Description

Now that we have a Cloud VPS project for experimenting with new Zuul jobs (T390081: Request creation of zuul3 VPS project), we'll need to stand up the new Zuul stack in the environment to begin testing jobs.

This environment should include Zuul and supporting services:

  • Zuul merger
  • Zuul scheduler
  • Zuul executor
  • Zuul web
  • Supporting services:
    • Nodepool
    • Zookeeper
    • MySQL

Final architecture of how these services are launched is out of scope for this task. This project is solely for experimenting with jobs and the architecture will likely bear no relation to the eventual production architecture—we only need a place where jobs can run.

Event Timeline

dduvall changed the task status from Open to In Progress.Apr 8 2025, 11:15 PM
dduvall claimed this task.
dduvall triaged this task as Medium priority.
dduvall added a subscriber: hashar.

@hashar FYI I've set up Zuul and friends on zuul-1001.zuul3.eqiad1.wikimedia.cloud using https://opendev.org/zuul/zuul/src/branch/master/doc/source/examples/docker-compose.yaml

There's no web proxy at the moment. This is intentional as I wanted to secure the accounts first.

I believe we had talked about hooking Zuul up to prod Gerrit in a read-only way but looking at the setup required for that, it doesn't seem like a good idea. The connection account would need at least the "Event Stream" permissions, possibly more, and that seems like a bad idea to expose a prod Gerrit credential to an WMCS instance. Instead, I'm looking at importing MediaWiki repos into the Gerrit instance that was spun up in the docker compose environment on zuul-1001. My approach will be to copy the projects from my train-dev's Gerrit to the instance as they already have the necessary refs/meta/config files (very basic boilerplate unlike prod Gerrit.

Having a Gerrit instance also seems helpful in that we can make test commits and exercise the Zuul pipelines more thoroughly without worrying about clobbering things.

@dduvall excellent!

The event stream permission is probably good enough, it does not grant any specific access beside the ability to receive events and we have multiple bots on WMCS using that same setup. As long as the user is not granted more permission, it can't do much. Setting up a Gerrit + repos + config might add a bit of a burden, then if you can reuse an existing setup that let gives us a great playground \o/

@dduvall excellent!

The event stream permission is probably good enough, it does not grant any specific access beside the ability to receive events and we have multiple bots on WMCS using that same setup. As long as the user is not granted more permission, it can't do much. Setting up a Gerrit + repos + config might add a bit of a burden, then if you can reuse an existing setup that let gives us a great playground \o/

I have a Gerrit running on zuul-1001.zuul3.eqiad1.wikimedia.cloud with all of the MW repos imported. Zuul is currently processing a slew of zuul.Merger jobs for the added repos. I have no idea why it generates so many initial jobs but I think it might be to sync up state of each branch of each repo in /var/lib/zuul on the executor perhaps?

@dduvall mentioned a couple updates here in the Release-Engineering-Team team meeting:

  • that this new zuul is now listening to gerrit.wikimedia.org stream events
  • that it is now limited to mediawiki/core

Further stuff worth mentioning:

  • There's a new branch with the config for zuul in the integration/config project: zuul3
  • The zuul-1001.zuul3.eqiad1.wikimedia.cloud is running zuul components via the upstream docker-compose.yaml example set up
  • This is all on the zuul-1001 box in /usr/local/src/zuul/doc/source/examples
  • Tennant config lives at /usr/local/src/zuul/doc/source/examples/etc_zuul/main.yaml (and includes mediawiki/core as a "untrusted-project")
  • There are noop jobs for the check and gate pipelines (projects.yaml)
  • The jobs should trigger on patchset-created (pipelines.yaml)
  • It's not triggering...for some reason right now :)

Not triggering on new patchsets

docker compose logs scheduler in /usr/local/src/zuul/doc/source/examples shows the zuul debug log.

I found a patchset-created event

2025-04-16 22:20:18,973 DEBUG zuul.Pipeline.wikimedia.check: [e: fd992fbd3a57422ba499cd82a5b7e3a2] Event <GerritTriggerEvent patchset-created gerrit.wikimedia.org/mediawiki/core master 1137061,3> for change <Change 0x7fedec06d190 mediawiki/core 1137061,3> does not match <GerritEventFilter connection: gerrit types: patchset-created ignore_deletes: True> in pipeline <DependentPipelineManager check> because False

soooo because False :)

I note that the GerritEventFilter is not what I'd expect given what's in the zuul3 branch of integration/config.

I wonder if this is because the tenant config calls the source gerrit.wikimedia.org, meanwhile the trigger in integration/config uses gerrit?

@dduvall mentioned a couple updates here in the Release-Engineering-Team team meeting:

  • that this new zuul is now listening to gerrit.wikimedia.org stream events
  • that it is now limited to mediawiki/core

Further stuff worth mentioning:

  • There's a new branch with the config for zuul in the integration/config project: zuul3
  • The zuul-1001.zuul3.eqiad1.wikimedia.cloud is running zuul components via the upstream docker-compose.yaml example set up
  • This is all on the zuul-1001 box in /usr/local/src/zuul/doc/source/examples
  • Tennant config lives at /usr/local/src/zuul/doc/source/examples/etc_zuul/main.yaml (and includes mediawiki/core as a "untrusted-project")
  • There are noop jobs for the check and gate pipelines (projects.yaml)
  • The jobs should trigger on patchset-created (pipelines.yaml)
  • It's not triggering...for some reason right now :)

Not triggering on new patchsets

docker compose logs scheduler in /usr/local/src/zuul/doc/source/examples shows the zuul debug log.

I found a patchset-created event

2025-04-16 22:20:18,973 DEBUG zuul.Pipeline.wikimedia.check: [e: fd992fbd3a57422ba499cd82a5b7e3a2] Event <GerritTriggerEvent patchset-created gerrit.wikimedia.org/mediawiki/core master 1137061,3> for change <Change 0x7fedec06d190 mediawiki/core 1137061,3> does not match <GerritEventFilter connection: gerrit types: patchset-created ignore_deletes: True> in pipeline <DependentPipelineManager check> because False

soooo because False :)

I note that the GerritEventFilter is not what I'd expect given what's in the zuul3 branch of integration/config.

Comes from https://opendev.org/zuul/zuul/src/commit/66e4ae57f84d30f6802eb82ded8d710c2031b7a8/zuul/manager/__init__.py#L219 I think? Which then leads back to https://opendev.org/zuul/zuul/src/commit/66e4ae57f84d30f6802eb82ded8d710c2031b7a8/zuul/scheduler.py#L2604

I wonder if this is because the tenant config calls the source gerrit.wikimedia.org, meanwhile the trigger in integration/config uses gerrit?

The debug log comes from:

zuul/manager/__init__.py
class PipelineManager(metaclass=ABCMeta):
...
    def eventMatches(self, event, change):
        log = get_annotated_logger(self.log, event)
        if event.forced_pipeline:
            if event.forced_pipeline == self.pipeline.name:
                log.debug("Event %s for change %s was directly assigned "
                          "to pipeline %s" % (event, change, self))
                return True
            else:
                return False
        for ef in self.event_filters:
            match_result = ef.matches(event, change)
            if match_result:
                log.debug("Event %s for change %s matched %s "
                          "in pipeline %s" % (event, change, ef, self))
                return True
            else:
                log.debug("Event %s for change %s does not match %s "
                          "in pipeline %s because %s" % (
                              event, change, ef, self, str(match_result)))
        return False

Which is that the received GerritTriggerEvent does not match the pipeline GerritEventFilter which is created with the connection name. And then:

zuul/model.py
class EventFilter(BaseFilter):

    def matches(self, event, ref):
        # TODO(jeblair): consider removing ref argument

        # Event came from wrong connection
        if self.connection_name != event.connection_name:
            return False

So yeah the triggers expects events from the same source, at least as far as names are involved. Good catch!

I have renamed the connections in zuul.conf as gerrit-dev and gerrit.wikimedia.org. I have added test/gerrit-ping with a noop job in check. When I send a dummy change https://gerrit.wikimedia.org/r/c/test/gerrit-ping/+/1137432 I get:

scheduler-1  | 2025-04-18T09:28:51.857605199Z 2025-04-18 09:28:51,857 DEBUG zuul.GerritConnection: [e: 602a5fbe4b5e469881df4f24f824a7f5] Scheduling event from gerrit.wikimedia.org: <GerritTriggerEvent comment-added gerrit.wikimedia.org/test/gerrit-ping master 1137432,1 Verified:2, Deploy-to-beta:0, Code-Review:0>

Nothing gets to run since noop does nothing :)