https://integration.wikimedia.org/ci/job/mwext-qunit-composer/3218/console
/srv/deployment/integration/slave-scripts/bin/mw-install-mysql.sh ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
https://integration.wikimedia.org/ci/job/mwext-qunit-composer/3218/console
/srv/deployment/integration/slave-scripts/bin/mw-install-mysql.sh ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Mentioned in SAL [2016-05-23T14:32:18Z] <jzerebecki> offlined integration-slave-trusty-1004 because it can't connect to mysql T135997
@thcipriani mentioned slaves are somehow/sometime missing mysql :(
I have rebooted that host earlier today. So maybe our puppet / service does not start on boot.
Sounds bad:
integration-slave-trusty-1017.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1004.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1006.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1003.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1025.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1023.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1014.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1001.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1018.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1012.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1015.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1024.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1016.integration.eqiad.wmflabs: mysql stop/waiting integration-slave-trusty-1013.integration.eqiad.wmflabs: mysql start/running, process 26269 integration-slave-trusty-1011.integration.eqiad.wmflabs: mysql stop/waiting
The mysql service is managed by puppet. Due to T96230 / T126699 we have a custom patch to handle mysql https://gerrit.wikimedia.org/r/#/c/204528/19/modules/role/manifests/ci/slave/labs.pp,cm
it is apparently entirely broken for some reason, most probably a recentish patches that landed in puppet.git production branch.
These machines seems to have mysql enabled on reboot:
thcipriani@integration-saltmaster:~$ sudo salt -G 'oscodename:trusty' cmd.run 'ls /etc/rc2.d | grep mysql'
integration-slave-trusty-1017.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1023.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1006.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1001.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1011.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1024.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1004.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1025.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1014.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1015.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1012.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1013.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1003.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1016.integration.eqiad.wmflabs:
S20mysql
integration-slave-trusty-1018.integration.eqiad.wmflabs:
S20mysql
pupppet logs don't show anything unusual.
The puppet service uses provider => debian and puppet agent eventually runs:
/etc/init.d/mysql status; echo $? mysql stop/waiting 0
The shell script eventually checks whether there is an upstart job and thus invokes initctl status mysql which does not know about mysql...
Restarting an instance:
Notice: /Stage[main]/Role::Ci::Slave::Labs/File[/var/lib/mysql]/owner: owner changed 'root' to 'mysql' Notice: /Stage[main]/Role::Ci::Slave::Labs/File[/var/lib/mysql]/group: group changed 'root' to 'mysql' Notice: /Stage[main]/Role::Ci::Slave::Labs/File[/var/lib/mysql]/mode: mode changed '1777' to '0775'
There is no mysql running ...
On some instances we have two process:
/usr/sbin/mysqld /bin/sh /usr/bin/mysqld_safe \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --
That last one is wrong :)
I have ended up with killall mysqld and upstart restarted it:
# salt -v '*trusty*' cmd.run '/etc/init.d/mysql status' Executing job with jid 20160523193513679149 ------------------------------------------- integration-slave-trusty-1023.integration.eqiad.wmflabs: mysql start/running, process 12035 integration-slave-trusty-1017.integration.eqiad.wmflabs: mysql start/running, process 20129 integration-slave-trusty-1014.integration.eqiad.wmflabs: mysql start/running, process 8892 integration-slave-trusty-1004.integration.eqiad.wmflabs: mysql start/running, process 15530 integration-slave-trusty-1015.integration.eqiad.wmflabs: mysql start/running, process 15508 integration-slave-trusty-1024.integration.eqiad.wmflabs: mysql start/running, process 5512 integration-slave-trusty-1025.integration.eqiad.wmflabs: mysql start/running, process 27209 integration-slave-trusty-1018.integration.eqiad.wmflabs: mysql start/running, process 15626 integration-slave-trusty-1012.integration.eqiad.wmflabs: mysql start/running, process 12085 integration-slave-trusty-1003.integration.eqiad.wmflabs: mysql start/running, process 2599 integration-slave-trusty-1011.integration.eqiad.wmflabs: mysql start/running, process 14315 integration-slave-trusty-1006.integration.eqiad.wmflabs: mysql start/running, process 6281 integration-slave-trusty-1001.integration.eqiad.wmflabs: mysql start/running, process 14202 integration-slave-trusty-1016.integration.eqiad.wmflabs: mysql start/running, process 13213 integration-slave-trusty-1013.integration.eqiad.wmflabs: mysql start/running, process 1276