Page MenuHomePhabricator

MySQL/MariaDB images for development environments
Open, MediumPublic

Description

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/550708 / MediaWiki-Docker originally used the Bitnami MySQL image because it allows you to use environment variables in the docker-compose file to easily setup database replication. The version of that patch which is likely to land just uses SQLite at the moment, but we'd like to provide our own MySQL/MariaDB images in dev-images.

Details

TitleReferenceAuthorSource BranchDest Branch
Draft: bullseye-mariadb: Initial versionrepos/releng/dev-images!38kharlanT238925main
Customize query in GitLab

Event Timeline

I think this is probably valid in the general case. It'd probably be wise to control all the base images for a basic dev setup.

(This has been lurking in the back of my mind for a while, thanks for filing the task.)

We could also use this with the wmde wikibase docker images which are currently on github and docker hub but that will be moving to gerrit and the build pipeline (tasks to come)
(well, if it were not a dev image)

brennen renamed this task from mysql/mariadb images to MySQL/MariaDB images for development environments.Feb 20 2020, 6:57 PM
brennen triaged this task as Medium priority.
brennen added a project: MediaWiki-Docker.
brennen updated the task description. (Show Details)

It seems that a non-sqlite image is a requirement if you want to do development that touches on Parsoid via Restbase (so, VisualEditor / DiscussionTools stuff mostly), because of clashing database locks.

It seems that a non-sqlite image is a requirement if you want to do development that touches on Parsoid via Restbase (so, VisualEditor / DiscussionTools stuff mostly), because of clashing database locks.

👍 I think we should switch to MySQL as the default setup. I think it wouldn't be very hard to make our own MySQL image (if we want to do that) or we could use the official mariadb image from Docker Hub. It would be more work to have a simple primary/replica set up that's managed via environment variables like the Bitnami image, and that could be done in a second phase of work.

brennen added a subscriber: jeena.

Yeah, on the whole I think this seems like it would solve more problems than it creates.

cc: @jeena

so, should we just go with https://hub.docker.com/_/mariadb (and if so, which version specifically?) or should we build off our base image in releng/dev-images and install mariadb, do our own setup / config, etc?

It seems that a non-sqlite image is a requirement if you want to do development that touches on Parsoid via Restbase (so, VisualEditor / DiscussionTools stuff mostly), because of clashing database locks.

👍 I think we should switch to MySQL as the default setup. I think it wouldn't be very hard to make our own MySQL image (if we want to do that) or we could use the official mariadb image from Docker Hub. It would be more work to have a simple primary/replica set up that's managed via environment variables like the Bitnami image, and that could be done in a second phase of work.

Sorry, I take this back. I don't think we should make it the default, because of the way docker-compose files are processed. For example, considering a docker-compose.yml file like this one:

version: '3.7'
services:
  mediawiki:
    user: "${MW_DOCKER_UID}:${MW_DOCKER_GID}"
  mariadb-main:
    image: 'mariadb:latest'

If the developer wants to use SQLite or Postgres as their DB, they'll have a MariaDB container needlessly running and which cannot be removed from the service definitions via the docker-compose.override.yml file.

So, maybe a better way forward is to do something we've talked about for a while now, which is have some kind of mw docker init command that would prompt you to pick your version of PHP and database backend, at which point you could choose from SQLite, MySQL or Postgres, and mw docker install would handle using the proper arguments to install.php instead of our existing script in /docker/install.sh that assumes SQLite will be used.

Having an easy and somewhat-official way to get a mediawiki+mysql instance running is what's needed. Whether it's the default or not doesn't really matter, I think, so long as we're fairly up-front about what the trade-offs are. (E.g. I think that DEVELOPERS.md should have an "only choose sqlite if..." disclaimer.)

I initially didn't want to clutter up the docker-compose file in core, but then thought if most devs need to use MySQL then having it as the default would be good.
So I think either way is fine as long as we make it easy to set up as @DLynch mentioned.

Having an easy and somewhat-official way to get a mediawiki+mysql instance running is what's needed. Whether it's the default or not doesn't really matter, I think, so long as we're fairly up-front about what the trade-offs are. (E.g. I think that DEVELOPERS.md should have an "only choose sqlite if..." disclaimer.)

We do have https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/Alternative_databases but yeah it would be nice to make this easier and more up front for people.

I initially didn't want to clutter up the docker-compose file in core, but then thought if most devs need to use MySQL then having it as the default would be good.
So I think either way is fine as long as we make it easy to set up as @DLynch mentioned.

I think most devs would want MySQL but to keep the core configuration as extensible as possible, I think we should use the override file to set that up, and that's where the mw utility could help IMHO.

but yeah it would be nice to make this easier and more up front for people.

I'm somewhat motivated by my team affiliation to argue for the configurations where it's possible to use VisualEditor being the defaults, admittedly. :D

If the developer wants to use SQLite or Postgres as their DB, they'll have a MariaDB container needlessly running and which cannot be removed from the service definitions via the docker-compose.override.yml file.

This will be solved by the mwdd (mediawiki-docker-dev) style approach that can be done with mw-cli

So, just noting what the current status quo is in mwcli
mysql (mariadb or mysql), postgres & sqlite are all possible (sqlite has some bugs filed for it right now though)
There is currently no default set, must be provided by the user, but perhaps that could be mysql/maria?

Having said that, wmf owned / controlled images for mysql, sqlite and postgres would still be beneficial to avoid users pulling from docker hub, but that does have the added cost of needing to maintain them?

Visx updated the task description. (Show Details)

I'm back to this task due to my work on T339352: Create MySQL container in CI for integration tests. I can't specify mariadb from hub.docker.com as a service in the GitLab CI YAML (logs):

Using Docker executor with image docker-registry.wikimedia.org/repos/releng/kokkuri:v1.6.0 ...
ERROR: The "mariadb:11.0.2" image is not present on list of allowed services:
- docker-registry.wikimedia.org/**/*
- docker-registry.discovery.wmnet/**/*
Please check runner's allowed_images configuration: https://docs.gitlab.com/runner/configuration/advanced-configuration.html

so I think this means I need to build a mariadb image in the dev-images project.

Apparently, one does not simply create a MariaDB docker image based on Debian. It's been frustrating so far, and looking at the large-ish amount of code in https://github.com/MariaDB/mariadb-docker/tree/master, I'm not sure that rolling our own is a great idea.

Does Release-Engineering-Team have suggestions on how to proceed? I think for dev images, we pretty much want https://github.com/MariaDB/mariadb-docker, but we can't pull the image directly. We can't really fork that repo into the releng/dev-images one.

Should we have a GitLab and Docker Registry namespace that supports building or mirroring (easier?) a limited set of "official" Docker images from Docker Hub?

Does Release-Engineering-Team have suggestions on how to proceed? I think for dev images, we pretty much want https://github.com/MariaDB/mariadb-docker, but we can't pull the image directly. We can't really fork that repo into the releng/dev-images one.

The use case in T339352 seems a little different than the one on this task. Is your use-case testing inside GitLab? In that case it seems fine to pull the upstream image.

I have done some work on spawning a MySQL database for Quibble and thus MediaWiki. Maybe that can be reused to craft an image? The material is at https://gerrit.wikimedia.org/g/integration/config/+/refs/heads/master/dockerfiles/quibble-buster/

Add the Debian package mariadb-server

Create the database files (taken from Quibble https://gerrit.wikimedia.org/r/plugins/gitiles/integration/quibble/+/refs/heads/master/quibble/backend.py#261 ) which would probably be either:

USER mysql
# Create DB file with a passwordless root user
RUN mysql_install_db --auth-root-authentication-method=normal

Or alternatively if creating files as mysql do not work:

USER root
RUN mysql_install_db --auth-root-authentication-method=normal --user=mysql

Some config tweaking for MediaWiki:

/etc/mysql/mariadb.conf.d/80-mediawiki.cnf
[client]
# Debian defaults to utf8mb4. T193222
default-character-set = binary
[mysqld]
# Debian defaults to utf8mb4. T193222
character_set_server     = binary
character_set_filesystem = binary
collation_server         = binary
# Stricter mode T119371
# Note: should also be set in MediaWiki via $wgSQLMode
sql_mode = 'TRADITIONAL'

And I guess run mariadb as an entrypoint as the mysql user?

The systemd unit in /usr/lib/systemd/system/mariadb.service has a few tweaks:

Additional capabilities that should be granted when spawning the container:

# CAP_IPC_LOCK To allow memlock to be used as non-root user
# CAP_DAC_OVERRIDE To allow auth_pam_tool (which is SUID root) to read /etc/shadow when it's chmod 0
#   does nothing for non-root, not needed if /etc/shadow is u+r
# CAP_AUDIT_WRITE auth_pam_tool needs it on Debian for whatever reason
CapabilityBoundingSet=CAP_IPC_LOCK CAP_DAC_OVERRIDE CAP_AUDIT_WRITE

Create /var/run/mysqld:

ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld

Raise some limits to be passed when spawning the container:

# Number of files limit. previously [mysqld_safe] open-files-limit
LimitNOFILE=32768
# For liburing and io_uring_setup()
LimitMEMLOCK=524288

For the permission grants I am not sure that can be done directly to the flat files, so most probably the entrypoint would need to be a wrapper which creates the db on the fly using mysql --user=root (which is passwordless per above):

CREATE DATABASE wiki;
GRANT ALL ON wiki TO 'wikiadmin'@'localhost'
IDENTIFIED BY 'somesecret';

Else go with the official MariaDB image. I have lost track but I am pretty sure we can allow third party images on Gitlab CI.

Else go with the official MariaDB image. I have lost track but I am pretty sure we can allow third party images on Gitlab CI.

Yes, the mariadb image from Docker Hub is now safelisted (see patches to operations/puppet in T339352).

I think we can close this task, and suggest that people use mariadb from Docker Hub. Note that we continue to promote bitnami/mariadb in documentation for MediaWiki-Docker replica DB setup.

Else go with the official MariaDB image. I have lost track but I am pretty sure we can allow third party images on Gitlab CI.

Yes, the mariadb image from Docker Hub is now safelisted (see patches to operations/puppet in T339352).

I think we can close this task, and suggest that people use mariadb from Docker Hub.

Note that we continue to promote bitnami/mariadb in documentation for MediaWiki-Docker replica DB setup.

That comes from your July 2020 edit describing how to add replication. Maybe that can be adjusted to use the informations for the official MariaDB image?

Else go with the official MariaDB image. I have lost track but I am pretty sure we can allow third party images on Gitlab CI.

Yes, the mariadb image from Docker Hub is now safelisted (see patches to operations/puppet in T339352).

I think we can close this task, and suggest that people use mariadb from Docker Hub.

Note that we continue to promote bitnami/mariadb in documentation for MediaWiki-Docker replica DB setup.

That comes from your July 2020 edit describing how to add replication. Maybe that can be adjusted to use the informations for the official MariaDB image?

That's because the Bitnami image has some scripts (AIUI) that facilitate setting up the replication environment using environment variables. If the official mariadb image supports setting up replication easily, we could point people towards using that for replica setups.