Page MenuHomePhabricator

toolsbeta-sgeexec-1001/2: buster sgeexec apt fails to write to /tmp
Closed, ResolvedPublic

Description

Running apt-get update on any of the toolsbeta buster sgeexec nodes fails as apt fails to write to /tmp:

W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://apt.wikimedia.org/wikimedia buster-wikimedia InRelease: Couldn't create temporary file /tmp/apt.conf.LCSA4F for passing config to apt-key

/tmp is mounted from a separate drive:

root@toolsbeta-sgeexec-1002:/tmp# df -h|grep /tmp
/dev/sdc                                                               20G  205M   20G   1% /tmp

Running ls on the directory stalls, so it seems like something is wrong with the mount itself?

Trying to manually unmount and re-mount the drive fails:

root@toolsbeta-sgeexec-1002:~# umount /tmp
umount: /tmp: target is busy.

Event Timeline

dcaro renamed this task from toolsbeta: buster sgeexec apt fails to write to /tmp to toolsbeta-sgeexec-1002: buster sgeexec apt fails to write to /tmp.Jul 29 2021, 10:54 AM
dcaro renamed this task from toolsbeta-sgeexec-1002: buster sgeexec apt fails to write to /tmp to toolsbeta-sgeexec-1001/2: buster sgeexec apt fails to write to /tmp.
dcaro claimed this task.

The libvirt config for that drive:

<disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='writeback'/>
  <source protocol='rbd' name='eqiad1-compute/82bcfa30-4c58-4589-a26e-2a0e14cf6931_disk.swap' index='1'>
    <host name='10.64.20.69' port='6789'/>
    <host name='10.64.20.68' port='6789'/>
    <host name='10.64.20.67' port='6789'/>
  </source>
  <target dev='sdc' bus='scsi'/>
  <iotune>
    <total_bytes_sec>200000000</total_bytes_sec>
    <read_iops_sec>5000</read_iops_sec>
    <write_iops_sec>500</write_iops_sec>
  </iotune>
  <alias name='scsi0-0-0-2'/>
  <address type='drive' controller='0' bus='0' target='0' unit='2'/>
</disk>

Remounted the dir:

mount -o remount /tmp

And tried to ls it:

root@toolsbeta-sgeexec-1002:~# ls /tmp | wc
  13029   13029  312879

It seems it has many apt-dpkg-install-* directories, looking

The filesystem is vfat, and the options for permissions are limited. When doing apt update, the system falls back to the user _apt for the apt-key management, and that fails to create the temp dirs.

I suggest reformatting the disks to ext4 (or xfs/btrfs).

Unless, @Andrew, do you have an idea if they must be vfat for some reason? :)

Mentioned in SAL (#wikimedia-cloud) [2021-07-29T13:06:23Z] <majavah> rebuild toolsbeta-sgeexec-1001 as -1003 T287666

Mentioned in SAL (#wikimedia-cloud) [2021-07-30T08:01:45Z] <majavah> replace toolsbeta-sgeexec-1002 with -1004 for T287666

Replaced both hosts with ones with ext4 /tmp.

The filesystem is vfat, and the options for permissions are limited. When doing apt update, the system falls back to the user _apt for the apt-key management, and that fails to create the temp dirs.

I suggest reformatting the disks to ext4 (or xfs/btrfs).

Unless, @Andrew, do you have an idea if they must be vfat for some reason? :)

Do we have any idea how it ended up being vfat? I have a hunch that it's a bug in the cinderutils::ensure + buster or something.

Do we have any idea how it ended up being vfat? I have a hunch that it's a bug in the cinderutils::ensure + buster or something.

I assume those VMs were created between the merges of https://github.com/wikimedia/puppet/commit/340aa63d8d6ce5dfa5bae20ff331457f1505cd97 and https://github.com/wikimedia/puppet/commit/4a2bc37873a6132fbddea633c958b6a6e4e5ff62. Not sure, however.