Page MenuHomePhabricator

Puppet failure on integration-agent-qemu*
Closed, ResolvedPublic

Description

Puppet is having issues on the "integration-agent-qemu-1003.integration.eqiad1.wikimedia.cloud (172.16.2.155)" instance in project
integration in Wikimedia Cloud VPS.

Puppet is having issues on the "integration-agent-qemu-1001.integration.eqiad.wmflabs (172.16.1.187)" instance in project
integration in Wikimedia Cloud VPS.


---- Failed resources if any:

  * Exec[available-space-docker]

Event Timeline

The instances are using:

InstanceFlavor
integration-agent-qemu-1001g2.cores8.ram24.disk80
integration-agent-qemu-1003g3.cores8.ram24.disk20.ephemeral40.4xiops

The whole disk space has been allocated to the second-local-disk logical volume mounted at /srv.

I have made two changes last week related to partitioning, the first is https://gerrit.wikimedia.org/r/c/operations/puppet/+/755713 to have the logical volume for Docker to use 24G rather than 70%FREE:

 class profile::ci::dockervolume {
     labs_lvm::volume { 'docker':
-        size      => '70%FREE',
+        size      => '24G',

The second change is to always create that docker volume https://gerrit.wikimedia.org/r/c/operations/puppet/+/755948/2/modules/role/manifests/ci/slave/labs/docker.pp . I have dropped the feature flag role::ci::slave::labs::docker::docker_lvm_volume since I thought all instances had it turned on. The qemu instances do not have that setting!

The puppet failure:

(/Stage[main]/Profile::Ci::Dockervolume/Labs_lvm::Volume[docker]/Exec[available-space-docker]/returns) Less than 1.5G is available for partitioning.
'/usr/local/sbin/pv-free' returned 1 instead of one of [0]

qemu-1001 has 80G, 60G being allocated to /srv with 14G being used.
qemu-1003 has 40G extra disk space, 40G being allocated to /srv with 1G used.

Mentioned in SAL (#wikimedia-releng) [2022-01-25T09:51:00Z] <hashar> integration-agent-qemu-1003: nuked /dev/vd/second-local-disk and /srv to make room for a docker logical volume. That has fixed puppet T299996

Mentioned in SAL (#wikimedia-releng) [2022-01-25T09:59:18Z] <hashar> integration-agent-qemu-1001: resizing /dev/mapper/vd-second--local--disk (/srv) to 20G : resize2fs -p /dev/mapper/vd-second--local--disk 20G # T299996

Mentioned in SAL (#wikimedia-releng) [2022-01-25T09:59:22Z] <hashar> integration-agent-qemu-1001: resized /srv to 100% disk free: lvextend -r -l +100%FREE /dev/mapper/vd-second--local--disk # T299996

hashar changed the task status from Duplicate to Resolved.Jan 25 2022, 9:59 AM

Solved on integration-agent-qemu-1001 as well:

root@integration-agent-qemu-1001:~# vgs
  VG #PV #LV #SN Attr   VSize  VFree
  vd   1   2   0 wz--n- 61.00g    0 
root@integration-agent-qemu-1001:~# lvs
  LV                VG Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker            vd -wi-ao---- 24.00g                                                    
  second-local-disk vd -wi-ao---- 37.00g                                                    
root@integration-agent-qemu-1001:~# lsblk 
NAME                       MAJ:MIN RM    SIZE RO TYPE MOUNTPOINT
vda                        254:0    0     80G  0 disk 
├─vda1                     254:1    0 1007.5K  0 part 
├─vda2                     254:2    0     19G  0 part /
└─vda3                     254:3    0     61G  0 part 
  ├─vd-second--local--disk 253:0    0     37G  0 lvm  /srv
  └─vd-docker              253:1    0     24G  0 lvm  /var/lib/docker
root@integration-agent-qemu-1001:~#