Page MenuHomePhabricator

Job runner in single-MySQL environment fails to obtain a DB connection on `docker-compose up`
Open, Needs TriagePublicBUG REPORT

Description

I typically run until today ran a single MySQL instance in my MW-Docker dev environment. For approximately the past month, when I restart a previously created MediaWiki-Docker development environment, the job runner MediaWiki instance is persistently unable to obtain a DB connection.

mediawiki-jobrunner_1  | JobQueueConnectionError from line 768 of /var/www/html/w/includes/jobqueue/JobQueueDB.php: DBConnectionError:Cannot access the database: Unknown error (database)
mediawiki-jobrunner_1  | #0 /var/www/html/w/includes/jobqueue/JobQueueDB.php(617): JobQueueDB->getReplicaDB()
mediawiki-jobrunner_1  | #1 /var/www/html/w/includes/jobqueue/JobQueue.php(697): JobQueueDB->doGetSiblingQueuesWithJobs(Array)
mediawiki-jobrunner_1  | #2 /var/www/html/w/includes/jobqueue/JobQueueGroup.php(374): JobQueue->getSiblingQueuesWithJobs(Array)
mediawiki-jobrunner_1  | #3 /var/www/html/w/includes/jobqueue/JobQueueGroup.php(256): JobQueueGroup->getQueuesWithJobs()
mediawiki-jobrunner_1  | #4 /var/www/html/w/includes/jobqueue/JobRunner.php(227): JobQueueGroup->pop(1, 1, Array)
mediawiki-jobrunner_1  | #5 /var/www/html/w/maintenance/runJobs.php(93): JobRunner->run(Array)
mediawiki-jobrunner_1  | #6 /var/www/html/w/maintenance/doMaintenance.php(112): RunJobs->execute()
mediawiki-jobrunner_1  | #7 /var/www/html/w/maintenance/runJobs.php(130): require_once('/var/www/html/w...')
mediawiki-jobrunner_1  | #8 {main}

I have not yet found a way of remedying this short of renaming LocalSettings.php and reinstalling MediaWiki.

Environment:

docker-compose.override.yaml (in relevant part):

version: '3.7'
services:
  database:
    image: mariadb
    environment:
      MYSQL_ALLOW_EMPTY_PASSWORD: 1
    volumes:
      - /dbdata:/var/lib/mysql
volumes:
  dbdata:
    driver: local

.env

MW_SCRIPT_PATH=/w
MW_SERVER=http://localhost:8080
MW_DOCKER_PORT=8080
MEDIAWIKI_USER=Admin
MEDIAWIKI_PASSWORD=dockerpass
XDEBUG_CONFIG="client_host=host.docker.internal client_port=9003 start_with_request=yes"
XDEBUG_ENABLE=true
XHPROF_ENABLE=true
MW_DOCKER_UID=1000
MW_DOCKER_GID=1000

macOS 11.2.2 Big Sur
Docker version 20.10.5, build 55c4c88
docker-compose version 1.28.5, build c4eb3a1f

Event Timeline

Caveat: I'm not positive that my wiki was configured correctly when I tried setting up replication earlier. I'll try setting up a fresh installation using DB replication from the start and see if I have better luck. In any case this is a bug for installations using a single MySQL instance.

OK, I've started from a fresh slate with DB replication and things are looking better. When I take the environment down and spin it back up, there are a few of the same errors from the job runner container, but it eventually connects after the replica DB finishes starting up. With the single-MySQL installation, the job runner just kept spewing the same error indefinitely. Maybe for some reason it's expecting a replica DB that doesn't exist?

Mholloway renamed this task from Job runner frequently fails to obtain a replica DB connection to MySQL on `docker-compose up` to Job runner in single-MySQL environment fails to obtain a DB connection on `docker-compose up`.Mar 9 2021, 6:08 PM
Mholloway updated the task description. (Show Details)

Thanks for filing this. I've noticed it as well on both single and replica DB setups as well, but I haven't investigated further.

To add a data point, my new environment is still working well starting from a cold boot this morning.

Another data point, I'm seeing the same thing on a fresh environment created following the recipe Alternative_databases#MySQL_(database_replication).

JobQueueConnectionError from line 782 of /var/www/html/w/includes/jobqueue/JobQueueDB.php: DBConnectionError:Cannot access the database: MySQL server has gone away (mariadb-replica) (mariadb-replica)
#0 /var/www/html/w/includes/jobqueue/JobQueueDB.php(631): JobQueueDB->getReplicaDB()
#1 /var/www/html/w/includes/jobqueue/JobQueue.php(681): JobQueueDB->doGetSiblingQueuesWithJobs(Array)
#2 /var/www/html/w/includes/jobqueue/JobQueueGroup.php(376): JobQueue->getSiblingQueuesWithJobs(Array)
#3 /var/www/html/w/includes/jobqueue/JobQueueGroup.php(262): JobQueueGroup->getQueuesWithJobs()
#4 /var/www/html/w/includes/jobqueue/JobRunner.php(227): JobQueueGroup->pop(1, 1, Array)
#5 /var/www/html/w/maintenance/runJobs.php(97): JobRunner->run(Array)
#6 /var/www/html/w/maintenance/doMaintenance.php(108): RunJobs->execute()
#7 /var/www/html/w/maintenance/runJobs.php(134): require_once('/var/www/html/w...')
#8 {main}

And on the replica logs:

9:09:09 9 [Warning] Aborted connection 9 to db: 'unconnected' user: 'unauthenticated' host: '172.18.0.3' (This connection closed normally without authentication)

Unsure if related.

The wiki works fine, can log in, edit, etc, but I suspect the jobs aren't running.