Page MenuHomePhabricator

Set up zuul scheduler on zuul1001
Closed, ResolvedPublic

Description

We need to set up the zuul scheduler to start running jobs on our zuul executor.

The zuul scheduler will run as a systemd unit with the scheduler image: docker-registry.wikimedia.org/repos/releng/zuul/zuul/zuul-scheduler:wmf-12.0.0-5

Additionally, per @Corvus in T395938#10929023:

Bind mounts for:

  • /etc/zuul
  • Zookeeper TLS certs (I think that's /etc/cfssl?)
  • /var/lib/zuul (optional—for .ssh/known_hosts)

And:

  • SSH key for zuul – for zuul to listen to the gerrit event stream. Seems like this could go in whatever we mount to /var/lib/zuul or /etc/zuul
  • Runs as the zuul user

Event Timeline

thcipriani renamed this task from Set up zuul scheduler of zuul1001 to Set up zuul scheduler on zuul1001.
LSobanski triaged this task as High priority.
LSobanski moved this task from Incoming to Work in Progress on the collaboration-services board.
LSobanski moved this task from Work in Progress to Backlog on the collaboration-services board.

Change #1192195 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] zuul: create systemd unit for zuul scheduler

https://gerrit.wikimedia.org/r/1192195

Change #1192195 merged by Dzahn:

[operations/puppet@production] zuul: create systemd unit for zuul scheduler

https://gerrit.wikimedia.org/r/1192195

First attempt to start this new systemd unit. Currently fails with:

Sep 29 20:06:08 zuul1001 docker[314344]:            ^^^^^^^^^^^^^^^^^^^
Sep 29 20:06:08 zuul1001 docker[314344]:   File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/engine/create.py", line 602, in create_engine
Sep 29 20:06:08 zuul1001 docker[314344]:     dbapi = dbapi_meth(**dbapi_args)
Sep 29 20:06:08 zuul1001 docker[314344]:             ^^^^^^^^^^^^^^^^^^^^^^^^
Sep 29 20:06:08 zuul1001 docker[314344]:   File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/dialects/mysql/mariadbconnector.py", line 155, in import_dbapi
Sep 29 20:06:08 zuul1001 docker[314344]:     return __import__("mariadb")
Sep 29 20:06:08 zuul1001 docker[314344]:            ^^^^^^^^^^^^^^^^^^^^^
Sep 29 20:06:08 zuul1001 docker[314344]: ModuleNotFoundError: No module named 'mariadb'
Sep 29 20:06:09 zuul1001 systemd[1]: zuul-scheduler.service: Main process exited, code=exited, status=1/FAILURE

is this what is missing?

python3-wmfdb - Libraries for interacting with WMF's mariadb deployments

First attempt to start this new systemd unit. Currently fails with:

Sep 29 20:06:08 zuul1001 docker[314344]:            ^^^^^^^^^^^^^^^^^^^
Sep 29 20:06:08 zuul1001 docker[314344]:   File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/engine/create.py", line 602, in create_engine
Sep 29 20:06:08 zuul1001 docker[314344]:     dbapi = dbapi_meth(**dbapi_args)
Sep 29 20:06:08 zuul1001 docker[314344]:             ^^^^^^^^^^^^^^^^^^^^^^^^
Sep 29 20:06:08 zuul1001 docker[314344]:   File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/dialects/mysql/mariadbconnector.py", line 155, in import_dbapi
Sep 29 20:06:08 zuul1001 docker[314344]:     return __import__("mariadb")
Sep 29 20:06:08 zuul1001 docker[314344]:            ^^^^^^^^^^^^^^^^^^^^^
Sep 29 20:06:08 zuul1001 docker[314344]: ModuleNotFoundError: No module named 'mariadb'
Sep 29 20:06:09 zuul1001 systemd[1]: zuul-scheduler.service: Main process exited, code=exited, status=1/FAILURE

Interesting! This looks like it might be something missing in the executor image. My guess is that we copied upstream, but upstream doesn't use mariadb like we do. @dduvall can you take a look at this?

Interesting! This looks like it might be something missing in the executor image. My guess is that we copied upstream, but upstream doesn't use mariadb like we do. @dduvall can you take a look at this?

They do support mariadb but via a different driver.

From https://zuul-ci.org/docs/zuul/latest/configuration.html#attr-database.dburi

Zuul supports PostgreSQL, MySQL, and MariaDB. Supported SQLAlchemy dialects and drivers are: postgresql://, mysql+pymysql://, and mariadb+pymysql.
If using MariaDB, be sure to use the mariadb dialect.

The database.dburi we currently have defined in our zuul.conf is mariadb+mariadbconnector://zuul:[redacted]@m1-master.eqiad.wmnet/zuul. So the sqlalchemy dialect is mariadb but the driver is mariadbconnector.

I'm not sure what package satisfies that driver. Looking into it...

It's possible that mysql+pymysql would also just work. but it's not super clear.

"The MariaDB variant of MySQL retains fundamental compatibility with MySQL’s protocols however the development of these two products continues to diverge. Within the realm of SQLAlchemy, the two databases have a small number of syntactical and behavioral differences that SQLAlchemy accommodates automatically. To connect to a MariaDB database, no changes to the database URL are required:"

https://docs.sqlalchemy.org/en/20/dialects/mysql.html

I tried it and I can confirm using mysql+pymysql gets us past the error.

INFO zuul.SQLConnection: Initializing SQL connection database (prefix: )

Change #1192614 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] zuul: use mysql+pymysql instead of mariadb+mariadbconnector in db URI

https://gerrit.wikimedia.org/r/1192614

Change #1192615 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] zuul: let zuul-scheduler also reach zookeeper outside container

https://gerrit.wikimedia.org/r/1192615

I tried it and I can confirm using mysql+pymysql gets us past the error.

That seems like the right solution for now since it's the configuration recommended by upstream docs.

If we run into problems, I can see about installing the mariadb pip package as part of our base image build.

Change #1192614 merged by Dzahn:

[operations/puppet@production] zuul: use mysql+pymysql instead of mariadb+mariadbconnector in db URI

https://gerrit.wikimedia.org/r/1192614

Change #1192615 merged by Dzahn:

[operations/puppet@production] zuul: let zuul-scheduler also reach zookeeper outside container

https://gerrit.wikimedia.org/r/1192615

Change #1193141 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] zuul: adjust zookeeper hosts/port in new zuul config

https://gerrit.wikimedia.org/r/1193141

Change #1193141 merged by Dzahn:

[operations/puppet@production] zuul: adjust zookeeper hosts/port in new zuul config

https://gerrit.wikimedia.org/r/1193141

with the latest firewall changes at T395938#11265924 ff the scheduler can now connect to zookeeper on the host

no more errors in the systemd status