Page MenuHomePhabricator

/var/lib/mysql/ filling up on old Precise slaves due to mysql usage
Closed, DeclinedPublic

Description

I have marked integration-slave1001 as offline in jenkins for now due to this.

Mysql default data dir is /var/lib/mysql which on Precise instance is a th 2GB /var. It is full on slave1001:

root@integration-slave1001:~# df -h /var/lib/mysql/
Filesystem          Size  Used Avail Use% Mounted on
/dev/mapper/vd-var  2.0G  1.9G  740K 100% /var

The /var/ is filled because /var/tmp/core has ton of: core.integration-slave1001.dvipng.<PID>.<EPOCH>

We need to:

  • Consider pointing MySQL to the extended disk space on /mnt (64GB)
  • Stop generating core dump files on CI. Antoine thinks it has been set originally for hhvm
  • They have been enabled for T64623 in bin/mw-set-env.sh . We can just drop the ulimit -c
  • Make sure innodb reclaims/reuse allocated table space when we drop a database
  • investigate dvipng throwing bunch of coredumps T94273

Event Timeline

Legoktm raised the priority of this task from to Needs Triage.
Legoktm updated the task description. (Show Details)
Legoktm added subscribers: Legoktm, Krinkle, hashar.
hashar set Security to None.

Edited tasks with some details. In short /var/ went full because of core files. I have added the instance back.

Legoktm triaged this task as High priority.Mar 27 2015, 5:48 PM

/var just filled up on integration-slave1002.

Legoktm renamed this task from Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) on integration-slave1001 in mediawiki-extensions-zend job to Small /var partition is filling up due to mysql usage on labs slaves.Mar 27 2015, 5:49 PM
Krinkle renamed this task from Small /var partition is filling up due to mysql usage on labs slaves to /var/lib/mysql/ filling up due to mysql usage on labs slaves.Mar 29 2015, 7:51 AM
Krinkle renamed this task from /var/lib/mysql/ filling up due to mysql usage on labs slaves to /var/lib/mysql/ filling up on old Precise slaves due to mysql usage.

Note that newer slaves have a larger /var. Re-creating the Precise slaves so that we have large /var (T91524) failed due to T91526.

Exactly what files are staying behind there? The databases are dropped after each build.

In short /var/ went full because of core files.

Hm.. can we prevent core dumps?

Hm.. can we prevent core dumps?

Yes core dumps. The MathSearch extension has tests using dvips which segfault for some reason. There is no debugging symbols so the stacktrace is not that useful :( I have filled T94273 about it.

Exactly what in /var/lib/mysql/ was so big? I assume MatchSearch/dvips does not put its core dumps in there.

Right now it seems all quite moderately sized:

$ dsh-ci-slaves 'sudo du -sh /var/lib/mysql/'
integration-slave-precise-1011.eqiad.wmflabs: 54M	/var/lib/mysql/
integration-slave-precise-1012.eqiad.wmflabs: 62M	/var/lib/mysql/
integration-slave-precise-1013.eqiad.wmflabs: 70M	/var/lib/mysql/
integration-slave-precise-1014.eqiad.wmflabs: 72M	/var/lib/mysql/
integration-slave-trusty-1011.eqiad.wmflabs: 87M	/var/lib/mysql/
integration-slave-trusty-1012.eqiad.wmflabs: 86M	/var/lib/mysql/
integration-slave-trusty-1013.eqiad.wmflabs: 78M	/var/lib/mysql/
integration-slave-trusty-1014.eqiad.wmflabs: 87M	/var/lib/mysql/
integration-slave-trusty-1015.eqiad.wmflabs: 78M	/var/lib/mysql/
integration-slave-trusty-1016.eqiad.wmflabs: 102M	/var/lib/mysql/

Also, because we re-created the instances, we no longer have the old system with a separate /var disk (2GB size). Instead, /var is now part of the main / disk, which is 18GB.

/var/lib/mysql itself wasn't big, just that core dumps were filling up /var so mysql would run out of room.

Krinkle claimed this task.
Krinkle moved this task from Next to Done on the Continuous-Integration-Infrastructure board.