Page MenuHomePhabricator

mysql does not start when Trusty instances spawn
Closed, DeclinedPublic

Description

Whenever a Trusty instance is rebooted, mysql is not start which causes any job relying on it to fail.

The issue is most probably related to the way the service is defined in puppet. We need to have it spawn after the tmpfs is available and the mysql datadir has been created in it:

modules/role/manifests/ci/slave/labs.pp
service { 'mysql':
    ensure   => running,
    enable   => true,
    # With the service being debian, which are the generic init independent
    # service wrappers create by debian and thus also work on ubuntu with
    # upstart, it won't instruct upstart to enable the service, thus not
    # reverting setting it to manual.
    provider => 'debian',
    require  => Exec['create-mysql-datadir'],
}

That has been introduced by https://gerrit.wikimedia.org/r/204528 to support T96230: Switch MySQL storage to tmpfs.

Maybe the 'create-mysql-datadir' should notify the service to have it started.

step to reproduce

mysql is not running. I had that instance rebooted on Jul 27 12:36 UTC.


*OR*

It is the debian package upgrade that fails. syslog shows:

/var/log/syslog
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29210]: Upgrading MySQL tables if necessary.
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29213]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29213]: Looking for 'mysql' as: /usr/bin/mysql
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29213]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29213]: Error: Failed while fetching Server version! Could be due to unauthorized access.
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29213]: FATAL ERROR: Upgrade failed
Jul 27 16:12:44 integration-slave-trusty-1011 /etc/mysql/debian-start[29229]: Checking for insecure root accounts.

And:

/var/log/upstart/mysql
160727 16:12:42 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
160727 16:12:42 [Note] /usr/sbin/mysqld (mysqld 5.5.50-0ubuntu0.14.04.1) starting as process 29170 ...
^G/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
^G/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
^G/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'debian-sys-maint'@'localhost' (using password: YES)'
Checking for tables which need an upgrade, are corrupt or were 
not closed cleanly.

Event Timeline

hashar triaged this task as High priority.Nov 18 2016, 3:12 PM

Change 328051 had a related patch set (by Paladox) published:
Contint: Make sure /mnt/home/jenkins-deploy/tmpfs is mounted before starting MySQL

https://gerrit.wikimedia.org/r/328051

that patch set above is now called "Contint: Notify Service mysql to restart"

but since that also wasn't really what it does i amended to "Contint: notify service mysql on creation of mysql dir"

Change 328051 abandoned by Paladox:
Contint: notify service mysql on creation of mysql dir

https://gerrit.wikimedia.org/r/328051

Mentioned in SAL (#wikimedia-releng) [2017-01-20T09:05:21Z] <hashar> integration restarted mysql on trusty permanent slaves T141450 T155815 salt -v '*trusty*' cmd.run 'service mysql start'

hashar lowered the priority of this task from High to Low.Mar 31 2017, 8:48 AM

This task is for the permanent slaves, the remaining jobs running on Trusty are:

analytics-refinery-release
analytics-refinery-update-jars
analytics-wikistats
composer-package-validate
composer-validate
jsduck
jshint
jsonlint
mediawiki-core-jsduck
mediawiki-vendor-composer-security
mwext-CirrusSearch-whitespaces
mwext-VisualEditor-jsduck
operations-dns-tabs
operations-mw-config-typos
performance-webpagetest-wmf
performance-webpagetest-wpt-org
php-compile-hhvm
php-compile-hhvm-test
php-compile-php55
wikimedia-fundraising-civicrm
wikimedia-fundraising-crm-jsonlint

I dont think any of them rely on MySQL being beside the civicrm one. And they can probably be run using Zend 5.6 on Jessie else we can wait Zend 5.5 to be made available T144959.

So really there is not much left to do here and that can be definitely marked as resolved once Trusty is phased out.

Almost every jobs are now running on Nodepool instances which do not suffer from this trouble.

Additionally, we no more have Trusty instances in CI.