Page MenuHomePhabricator

CiviCRM CI jobs fails when migrating from Stretch to Bullseye
Closed, ResolvedPublic

Description

The wikimedia/fundraising/crm git repository has CI tests running in the image docker-registry.wikimedia.org/releng/civicrm.

https://gerrit.wikimedia.org/r/c/integration/config/+/742796 created the 0.3.0-s1 image to migrate from Stretch to Bullseye and we switched to this image via https://gerrit.wikimedia.org/r/c/integration/config/+/777029/2/jjb/wm-fundraising.yaml

Since MariaDB is a different version, the CiviCRM database provisioning system now fails:

Status: Downloaded newer image for docker-registry.wikimedia.org/releng/civicrm:0.3.0-s1
+ mkdir -p /tmp/mysqld/datadir
+ /usr/bin/mysql_install_db --user=nobody --datadir=/tmp/mysqld/datadir
chown: cannot access '/usr/lib/mysql/plugin/auth_pam_tool_dir/auth_pam_tool': Permission denied
Couldn't set an owner to '/usr/lib/mysql/plugin/auth_pam_tool_dir/auth_pam_tool'.
It must be root, the PAM authentication plugin doesn't work otherwise..

chown: changing ownership of '/usr/lib/mysql/plugin/auth_pam_tool_dir': Operation not permitted
Cannot change ownership of the '/usr/lib/mysql/plugin/auth_pam_tool_dir' directory
to the 'nobody' user. Check that you have the necessary permissions and try again.

Installing MariaDB/MySQL system tables in '/tmp/mysqld/datadir' ...
2022-04-28 23:08:52 0 [Warning] Ignoring user change to 'nobody' because the user was set to 'mysql' earlier on the command line

OK

To start mysqld at boot time you have to copy
support-files/mysql.server to the right place for your system


Two all-privilege accounts were created.
One is root@localhost, it has no password, but you need to
be system 'root' user to connect. Use, for example, sudo mysql
The second is nobody@localhost, it has no password either, but
you need to be the system 'nobody' user to connect.
After connecting you can set the password, if you would need to be
able to connect as any of these users with a password and without sudo

See the MariaDB Knowledgebase at https://mariadb.com/kb or the
MySQL manual for more instructions.

You can start the MariaDB daemon with:
cd '/usr' ; /usr/bin/mysqld_safe --datadir='/tmp/mysqld/datadir'

You can test the MariaDB daemon with mysql-test-run.pl
cd '/usr/mysql-test' ; perl mysql-test-run.pl

Please report any problems at https://mariadb.org/jira

The latest information about MariaDB is available at https://mariadb.org/.
You can find additional information about the MySQL part at:
https://dev.mysql.com
Consider joining MariaDB's strong and vibrant community:
https://mariadb.org/get-involved/

+ MYSQL_SOCKET=/var/run/mysqld/mysqld.sock
+ mysqld='/usr/sbin/mysqld
    --verbose
    --datadir=/tmp/mysqld/datadir
    --log-error=/tmp/mysqld/error.log
    --pid-file=/tmp/mysqld/mysqld.pid
    --socket=/var/run/mysqld/mysqld.sock'

The message says:

Two all-privilege accounts were created.

One is root@localhost, it has no password, but you need to be system root user to connect.

The second is nobody@localhost, it has no password either, but you need to be the system nobody user to connect.

Event Timeline

The CI image entry point starts MySQL then invokes:

/src/wikimedia/fundraising/crm/bin/ci-create-dbs.sh
/src/wikimedia/fundraising/crm/bin/ci-populate-dbs.sh

The scripts are in the wikimedia/fundraising/crm repository and rely on mysql -u root and invoke amp with --mysql_dsn=mysql://root@127.0.0.1:3306. We would need that to be configurable and changed to use nobody when running as the nobody Unix user.

Change 787694 had a related patch set uploaded (by Hashar; author: Hashar):

[wikimedia/fundraising/crm@master] ci: use nobody@localhost as privileged MySQL user

https://gerrit.wikimedia.org/r/787694

Change 787724 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: rollback CiviCRM image to Stretch

https://gerrit.wikimedia.org/r/787724

Change 787724 merged by jenkins-bot:

[integration/config@master] jjb: rollback CiviCRM image to Stretch

https://gerrit.wikimedia.org/r/787724

Thanks so much for all this, @hashar! So, looks like for now we have working CI with the rollback to Stretch. :) :)

That is correct!

My ideas is to add a second job to CI which is running with Bullseye and have it run in parallel or as an experimental job (so it only triggers when one comments check experimental). Then we will be able to adjust the shell scripts to take in account whatever is needed by the new MariaDB version.

https://gerrit.wikimedia.org/r/787694 is a first step which deal with the root versus nobody username on Bullseye. Then last Friday that failed on https://integration.wikimedia.org/ci/job/wikimedia-fundraising-civicrm-docker/7931/console (I have marked this job result to be kept) with:

00:00:46.523 + /src/wikimedia/fundraising/crm/bin/ci-create-dbs.sh
00:00:46.572 ERROR 1238 (HY000) at line 1: Variable 'innodb_file_format' is a read only variable

I haven't further dig into it and then rolledback the CI job to Stretch (hence the idea to add a second job which is Bullseye based). It is hopefully an easy fix ;)

Will try to remember about this task, note this week I am running the MediaWiki train deployment to production and will thus be a bit busy.

@hashar thanks! Just to note, actually a previously passing change is now failing. Seems likely not to be a CI issue, though, still checking.

https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/786444
https://integration.wikimedia.org/ci/job/wikimedia-fundraising-civicrm-docker/7934/console

New unit test failures, from this line onward:

11:24:02 There were 3 failures:

@hashar thanks! Just to note, actually a previously passing change is now failing. Seems likely not to be a CI issue, though, still checking.

Quick update: @Ejegg confirmed this is indeed unrelated to CI. Thanks again!

Thanks for the confirmation it is not CI related :)

Change 791026 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] [wikimedia/fundraising/crm] add experimental Bullseye job

https://gerrit.wikimedia.org/r/791026

Change 791026 merged by jenkins-bot:

[integration/config@master] [wikimedia/fundraising/crm] add experimental Bullseye job

https://gerrit.wikimedia.org/r/791026

I have deployed an experimental job which uses a Bullseye image and triggered it on https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/787694 by commenting check experimental. That leads to:

+ /src/wikimedia/fundraising/crm/bin/ci-create-dbs.sh
ERROR 1238 (HY000) at line 1: Variable 'innodb_file_format' is a read only variable

Which comes from:

bin/ci-create-dbs.sh
mysql -u "$MYSQL_PRIVILEGED_USER" <<EOS
  SET GLOBAL innodb_file_format='Barracuda';
  SET GLOBAL innodb_default_row_format='dynamic';
  SET GLOBAL innodb_file_per_table = 1;
  SET GLOBAL innodb_large_prefix = ON;
EOS

Which leads me to T273704#6838844 and T274438 :

@Dwisehaupt working through this on our CI enviroment we learnt 2 things

  1. the following global settings are required to support utf8mb4 (some of which are already set) SET GLOBAL innodb_file_format='Barracuda'; SET GLOBAL innodb_default_row_format='dynamic'; SET GLOBAL innodb_file_per_table = 1; SET GLOBAL innodb_large_prefix = ON;

These have also been applied and codified in puppet for my.cnf. Merged and pushed out as stated in T274438. All set from our side for those settings.

Apparently that is needed for the utf8mb4 conversion.

Maybe we will need to introduce a MariaDB configuration file in the docker-registry.wikimedia.org/releng/civicrm image. It currently uses the defaults from the Debian packages. Or the innodb settings should be passed to mysql_install_db which creates the files. From integration/config.git:

dockerfiles/civicrm/run-with-mysqld.sh
mkdir -p /tmp/mysqld/datadir
/usr/bin/mysql_install_db --user=nobody --datadir=/tmp/mysqld/datadir

I would note that at that time utf8bm4 was a conversion from utf8 which was the old default - however, utf8mb4 is pretty much the MariaDB default now - so if we are on more recent MariaDB now than we were then those lines may no longer be required.

Change 956485 had a related patch set uploaded (by Ejegg; author: Ejegg):

[wikimedia/fundraising/crm@master] Use 'nobody' rather than 'root' for db connection

https://gerrit.wikimedia.org/r/956485

Looks like all of those innodb_ variables being set in ci-create-dbs are being set to the default values as of mariadb 10.5, so I think we can just skip that file entirely!

Change 956485 abandoned by Ejegg:

[wikimedia/fundraising/crm@master] Use 'nobody' rather than 'root' for db connection

Reason:

Piggy-backing on hashar's Ic02a0be2ea541ce2ef19b22f8c9dcd5026055248

https://gerrit.wikimedia.org/r/956485

Rereading the task, one of the error is potentially addressable:

name=/usr/bin/mysql_install_db --user=nobody --datadir=/tmp/mysqld/datadir
chown: cannot access '/usr/lib/mysql/plugin/auth_pam_tool_dir/auth_pam_tool': Permission denied
Couldn't set an owner to '/usr/lib/mysql/plugin/auth_pam_tool_dir/auth_pam_tool'.
It must be root, the PAM authentication plugin doesn't work otherwise..

chown: changing ownership of '/usr/lib/mysql/plugin/auth_pam_tool_dir': Operation not permitted
Cannot change ownership of the '/usr/lib/mysql/plugin/auth_pam_tool_dir' directory
to the 'nobody' user. Check that you have the necessary permissions and try again.

In Quibble we use the same command but with an extra parameter:

# Legacy system with a passwordless root user
'--auth-root-authentication-method=normal',

@Ejegg poked me about this task, I will check tomorrow and write the step by step to reproduce the CI build. Then maybe I can attempt to find a way to fix it.

If I get something I will send it as a change to Gerrit which can be verified by commenting check experimental which then triggers the job using the Bullseye based image.

For the repro:

$ git clone https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm
$ docker run --rm -it -v ./crm:/src/wikimedia/fundraising/crm \
   docker-registry.wikimedia.org/releng/civicrm:0.3.0-s1

Following up from yesterday mention of Quibble passing to mysqld_install_db the option --auth-root-authentication-method=normal, I went to look at /usr/bin/mysqld_install_db help which gives:

--auth-root-authentication-method=normal|socket

Chooses the authentication method for the created initial root user. The
historical behavior is normal to creates a root user that can login without
password, which can be insecure.
The default behavior socket sets an invalid root password but allows the
system root user to login as MariaDB root without a password.

--auth-root-socket-user=user

Used with --auth-root-authentication-method=socket. It specifies the name
of the second MariaDB root account, as well as of the system account allowed
to access it.
Defaults to the value of --user.

And on the Quibble change https://gerrit.wikimedia.org/r/c/integration/quibble/+/731438 the commit summary is:

Fix MySQL user creation on Debian Bullseye

Change 957237 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: civicrm: install db with root auth

https://gerrit.wikimedia.org/r/957237

Change 957237 merged by jenkins-bot:

[integration/config@master] dockerfiles: civicrm: install db with root auth

https://gerrit.wikimedia.org/r/957237

Mentioned in SAL (#wikimedia-releng) [2023-09-13T07:55:29Z] <hashar> Built docker-registry.wikimedia.org/releng/civicrm:0.3.0-s4 image for T307178

Change 957243 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: civicrm: create db from source repo script

https://gerrit.wikimedia.org/r/957243

Change 957243 merged by jenkins-bot:

[integration/config@master] dockerfiles: civicrm: create db from source repo script

https://gerrit.wikimedia.org/r/957243

Change 957244 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update image for wikimedia-fundraising-civicrm-bullseye-docker

https://gerrit.wikimedia.org/r/957244

Change 957244 merged by jenkins-bot:

[integration/config@master] jjb: update image for wikimedia-fundraising-civicrm-bullseye-docker

https://gerrit.wikimedia.org/r/957244

Change 957249 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] zuul: move wikimedia/fundraising/crm to Bullseye

https://gerrit.wikimedia.org/r/957249

@Ejegg here is the summary for this morning hacking session: I got crm to pass with the Bullseye based image!

There were two issues related to the Stretch > Bullseye upgrade:

  • the way authentication is done got changed, which can be set back to the legacy mode of root + no password by using --auth-root-authentication-method=normal
  • the bin/ci-creates-dbs.sh attempted to change the InnoDB file format by setting GLOBALS in the running MariaDB daemon. I am guessing previously that was simply ignored (I don't think MariaDB was able to change the file format on the fly). I have moved the mysql_install_db invocation from the image to the crm repository

The change to review is https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/787694 . I did a check experimental to trigger it and it passed so I am guessing it is fine but I haven't looked further than it being a SUCCESS.

If that looks good I can then switch Zuul / CI to use the Bullseye image by deploying https://gerrit.wikimedia.org/r/c/integration/config/+/957249

Then one can Code-Review: +2 Gerrit: crm #787694 which will then be tested using the Bullseye image and if it passes will get the change merged completing the migration.

\o/

Change 956821 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: civicrm

https://gerrit.wikimedia.org/r/956821

Change 956821 merged by jenkins-bot:

[integration/config@master] dockerfiles: civicrm

https://gerrit.wikimedia.org/r/956821

Change 957291 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update image for wikimedia-fundraising-civicrm-bullseye-docker

https://gerrit.wikimedia.org/r/957291

Change 957291 merged by jenkins-bot:

[integration/config@master] jjb: update image for wikimedia-fundraising-civicrm-bullseye-docker

https://gerrit.wikimedia.org/r/957291

Change 957249 merged by jenkins-bot:

[integration/config@master] zuul: move wikimedia/fundraising/crm to Bullseye

https://gerrit.wikimedia.org/r/957249

Change 787694 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] CI: stop trying to change innodb settings

https://gerrit.wikimedia.org/r/787694

Ejegg claimed this task.

It's working!