Page MenuHomePhabricator

deployment-ores-redis /srv/ redis is too small (500MBytes)
Closed, ResolvedPublic


deployment-ores-redis.deployment-prep.eqiad.wmflabs causes disks alarms:

Free space - all mounts on deployment-ores-redis is CRITICAL: CRITICAL: deployment-prep.deployment-ores-redis.diskspace._srv.byte_percentfree (<44.44%)

It has redis installed on the instance extended disk mounted on /srv. However the instance is a m1.small with 20GBytes disk and / already allocates 20GB leaving on 438MBytes for /srv:

$ df -h -t ext4
Filesystem                          Size  Used Avail Use% Mounted on
/dev/vda3                            19G  2.2G   16G  13% /
/dev/mapper/vd-second--local--disk  484M  438M   17M  97% /srv

There are two ways to solve it:

  • Remove the extended disk mount and have redis data directly into / . Involves a bit of puppet work to remove the Mount['/srv'], then one will have to copy the data, unmount and move data to /
  • migrate to a new instance using the flavor c1.m2.s80 which comes with 80 Gbytes of disk. Which probably involves more configuration to update the IP wherever it is used.

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2017-03-21T16:07:14Z] <Amir1> ladsgroup@deployment-ores-redis:~$ redis-cli -h deployment-ores-redis.deployment-prep.eqiad.wmflabs -p 6380 -a areallysecretpassword flushall (T160762)

This should make it less severe until I migrate it to a bigger instance

Mentioned in SAL (#wikimedia-releng) [2017-03-21T16:47:37Z] <halfak> halfak@deployment-ores-redis:~$ redis-cli -h deployment-ores-redis.deployment-prep.eqiad.wmflabs -p 6380 -a areallysecretpassword flushall (T160762)

Halfak triaged this task as High priority.Mar 24 2017, 9:21 PM

I get this fixed by migrating to a new instance ASAP.

(Let us know if you need any assistance.)

Mentioned in SAL (#wikimedia-releng) [2017-03-24T21:34:35Z] <Amir1> launching deployment-ores-redis-02 (T160762)

Mentioned in SAL (#wikimedia-releng) [2017-03-25T10:39:55Z] <Amir1> changing ores redis address to deployment-ores-redis-01 (T160762)

Mentioned in SAL (#wikimedia-releng) [2017-03-25T10:46:17Z] <Amir1> deleting deployment-ores-redis (T160762)

Okay. I migrated the redis server from deployment-ores-redis to deployment-ores-redis-01 which is a medium size instance and should not run into space issues any time soon (at least for years). Due to T148929: New instances attached to a role::puppetmaster::standalone Puppetmaster need manual changes after switching from the default Puppetmaster it took way more than I expected but I got it done (and made some notes in the task). This is done now and ores in beta works as expected:

Ladsgroup moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.
Ladsgroup moved this task from In progress to Done on the User-Ladsgroup board.