Page MenuHomePhabricator

debian-glue tries to fetch obsolete package
Closed, ResolvedPublic

Description

https://gerrit.wikimedia.org/r/#/c/268563/ caused debian-glue to fail with (https://integration.wikimedia.org/ci/job/debian-glue/82/console):

[…]
16:24:16 Writing extended state information...
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main libpython3.4-minimal amd64 3.4.3-8
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main libpython3.4-stdlib amd64 3.4.3-8
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main python3.4-minimal amd64 3.4.3-8
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main python3-minimal amd64 3.4.3-4
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main python3.4 amd64 3.4.3-8
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main libpython3-stdlib amd64 3.4.3-4
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main dh-python all 2.20150826
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main python3 amd64 3.4.3-4
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main distro-info-data all 0.27
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:16 Err http://mirrors.wikimedia.org/debian/ sid/main lsb-release all 9.20150917
16:24:16   404  Not Found [IP: 208.80.154.10 80]
16:24:17 E: Failed to fetch http://mirrors.wikimedia.org/debian/pool/main/p/python3.4/libpython3.4-minimal_3.4.3-8_amd64.deb: 404  Not Found [IP: 208.80.154.10 80]
16:24:17 E: Unable to correct for unavailable packages
[…]

This is because mirrors.wikimedia.org sid/main provides the package libpython3.4-minimal_3.4.4-2_amd64.deb, not -3.4.3-8. Apparently, apt-get update has not been run on the hosts trying to build the package so obsolete information has been kept around.

I stumbled over this the first time about 15 hours ago, so cron should have run apt-get update since then as part of /usr/local/sbin/puppet-run, so I assume the hosts running the package builds are configured in a different way.

Event Timeline

scfc raised the priority of this task from to Needs Triage.
scfc updated the task description. (Show Details)
scfc moved this task to Untriaged on the Continuous-Integration-Infrastructure board.
scfc subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

The jenkins-debian-glue scripts invokes cowbuilder --update which refresh the image (time UTC):

16:23:21 I: upgrading packages
16:23:21 Get:1 http://mirrors.wikimedia.org sid InRelease [278 kB]
16:23:21 Get:2 http://mirrors.wikimedia.org sid/main amd64 Packages/DiffIndex [27.9 kB]
16:23:21 Get:3 http://mirrors.wikimedia.org sid/main Translation-en/DiffIndex [27.9 kB]
16:23:22 Get:4 http://mirrors.wikimedia.org sid/main Translation-en [5288 kB]
16:23:22 Get:5 http://mirrors.wikimedia.org sid/main amd64 Packages [8098 kB]
16:23:24 Fetched 13.7 MB in 2s (5209 kB/s)
16:23:25 Reading package lists...

Could it be that mirrors.wikimedia.org is not in sync? Ie still referencing 3.4.3 in Packages but rest of files having been deleted/update properly.

python3.4 source package in unstable is 3.4.4 and the 3.4.3 are gone from http://mirrors.wikimedia.org/debian/pool/main/p/python3.4/

I confirmed http://mirrors.wikimedia.org/debian/dists/sid/main/binary-amd64/Packages.gz properly refers to 3.4.4.

Looks like the cowbuilder --update commands might fail updating the image somehow :(

AFAIUI debian-glue sits on top of package_builder, so adding @akosiaris.

On closer look, I noted another error more or less beneath the package updates:

[…]
16:24:11 Package 'ccache' is not installed, so not removed
16:24:11 aptitude is already the newest version (0.7.5-3).
16:24:11 build-essential is already the newest version (12.2).
16:24:11 dpkg-dev is already the newest version (1.18.4).
16:24:11 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
16:24:11 I: Copying back the cached apt archive contents
16:24:11 W: no hooks of type E found -- ignoring
16:24:11 E: Logic failure in hook handling. Directory /mnt/pbuilder/build/cow.21918/tmp/hooks should exist but it does not.
16:24:11 I: unmounting dev/pts filesystem
16:24:11 I: unmounting run/shm filesystem
16:24:11 I: unmounting proc filesystem
16:24:11 rename: Not a directory
16:24:11  -> removing cowbuilder working copy
16:24:11  -> Moving work directory [/var/cache/pbuilder/build//cow.21918] to final location [/var/cache/pbuilder/base-unstable-amd64.cow/] and cleaning up old copy
16:24:11 + '[' 0 -eq 0 ']'
16:24:11 + '[' -n /tmp/tmp.1G3E6SKe7Q ']'
[…]

Is this error about /mnt/pbuilder/build/cow.21918/tmp/hooks missing triggering something, a result of something else or not important?

The puppet class package_builder initialize various cowbuilder images. It provides a few hooks used to inject the wikimedia custom packages whenever WIKIMEDIA is set or the target version is suffixed with -wikimedia.

Hooks dir:

integration-slave-jessie-1001:~$ tree /var/cache/pbuilder/hooks
/var/cache/pbuilder/hooks
├── jessie
│   ├── D01apt.wikimedia.org
│   └── D05localsources
├── precise
│   ├── D01apt.wikimedia.org
│   └── D05localsources
└── trusty
    ├── D01apt.wikimedia.org
    └── D05localsources

In the console output WIKIMEDIA=no. The hooks are injected via a global configuration /etc/pbuilderrc. Lets dig further:

One of the command does:
cp -al /var/cache/pbuilder/base-unstable-amd64.cow/ /var/cache/pbuilder/build//cow.21918

Then invokes pbuilder chrooted in that copy:

pbuilder update --configfile /tmp/tmp.1G3E6SKe7Q --buildplace /var/cache/pbuilder/build//cow.21918 --no-targz --internal-chrootexec chroot /var/cache/pbuilder/build//cow.21918 cow-shell

Login in the unstable cow image with cowbuilder --login --basepath /var/cache/pbuilder/base-unstable-amd64.cow/ the hooks are injected in /tmp/hooks.

Interestingly the temporary files are still available on the instance!

find /mnt/pbuilder/build/cow.21918/ -name hooks
/mnt/pbuilder/build/cow.21918/usr/share/initramfs-tools/hooks

The thing is that our /etc/pbuilderrc has:

HOOKDIR=/var/cache/pbuilder/hooks/$DIST

And we do not have hooks for unstable, hence the directory does not exist. I guess our pbuilderrc should check whether the directory exist before setting the hookdir.

Note that a previous successful build passed just fine despite the logic failure. But I guess it prevents the cowbuilder image from properly updating.

TODO only set HOOKDIR when target directory actually exists.

I have created /var/cache/pbuilder/hooks/unstable and retriggered a build of https://gerrit.wikimedia.org/r/#/c/268563/

Build fails https://integration.wikimedia.org/ci/job/debian-glue/83/ but apparently the image got updated:

Moving work directory
  [/var/cache/pbuilder/build//cow.27178]
to final location
  [/var/cache/pbuilder/base-unstable-amd64.cow/]
and cleaning up old copy

Still fails though :( Too lazy this week-end to review the diff between build 83 and 84 :-(

The triggering package has now moved to groff-base and others (https://integration.wikimedia.org/ci/job/debian-glue/89/console):

[…]
The following NEW packages will be installed:
  autoconf{a} automake{a} autopoint{a} autotools-dev{a} bsdmainutils{a} 
  debhelper{a} dh-autoreconf{a} file{a} gettext{a} gettext-base{a} 
  groff-base{a} intltool-debian{a} libcroco3{a} libffi6{a} libglib2.0-0{a} 
  libicu55{a} libio-pty-perl{a} libipc-run-perl{a} libmagic1{a} 
  libpipeline1{a} libsigsegv2{a} libstring-shellquote-perl{a} libtool{a} 
  libunistring0{a} libxml2{a} m4{a} man-db{a} po-debconf{a} 
0 packages upgraded, 28 newly installed, 0 to remove and 0 not upgraded.
Need to get 19.1 MB of archives. After unpacking 65.6 MB will be used.
Writing extended state information...
Err http://mirrors.wikimedia.org/debian/ sid/main groff-base amd64 1.22.3-1
  404  Not Found [IP: 208.80.154.10 80]
Get: 1 http://mirrors.wikimedia.org/debian/ sid/main bsdmainutils amd64 9.0.6 [183 kB]
Err http://mirrors.wikimedia.org/debian/ sid/main libpipeline1 amd64 1.4.1-1
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main man-db amd64 2.7.3-1
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main libffi6 amd64 3.2.1-3
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main libglib2.0-0 amd64 2.44.1-1.1
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main libicu55 amd64 55.1-5
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main libxml2 amd64 2.9.2+really2.9.1+dfsg1-0.2
  404  Not Found [IP: 208.80.154.10 80]
Get: 2 http://mirrors.wikimedia.org/debian/ sid/main libcroco3 amd64 0.6.8-3+b1 [135 kB]
Get: 3 http://mirrors.wikimedia.org/debian/ sid/main libsigsegv2 amd64 2.10-4+b1 [29.2 kB]
Get: 4 http://mirrors.wikimedia.org/debian/ sid/main libunistring0 amd64 0.9.3-5.2+b1 [288 kB]
Err http://mirrors.wikimedia.org/debian/ sid/main libmagic1 amd64 1:5.25-1
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main file amd64 1:5.25-1
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main gettext-base amd64 0.19.6-1
  404  Not Found [IP: 208.80.154.10 80]
Get: 5 http://mirrors.wikimedia.org/debian/ sid/main m4 amd64 1.4.17-4 [254 kB]
Get: 6 http://mirrors.wikimedia.org/debian/ sid/main autoconf all 2.69-9 [338 kB]
Get: 7 http://mirrors.wikimedia.org/debian/ sid/main autotools-dev all 20150820.1 [71.7 kB]
Get: 8 http://mirrors.wikimedia.org/debian/ sid/main automake all 1:1.15-3 [735 kB]
Err http://mirrors.wikimedia.org/debian/ sid/main autopoint all 0.19.6-1
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main gettext amd64 0.19.6-1
  404  Not Found [IP: 208.80.154.10 80]
Get: 9 http://mirrors.wikimedia.org/debian/ sid/main intltool-debian all 0.35.0+20060710.4 [26.3 kB]
Err http://mirrors.wikimedia.org/debian/ sid/main po-debconf all 1.0.18
  404  Not Found [IP: 208.80.154.10 80]
Err http://mirrors.wikimedia.org/debian/ sid/main debhelper all 9.20150811
  404  Not Found [IP: 208.80.154.10 80]
Get: 10 http://mirrors.wikimedia.org/debian/ sid/main libtool all 2.4.2-1.11 [190 kB]
Get: 11 http://mirrors.wikimedia.org/debian/ sid/main dh-autoreconf all 10 [15.2 kB]
Get: 12 http://mirrors.wikimedia.org/debian/ sid/main libio-pty-perl amd64 1:1.08-1+b4 [35.8 kB]
Get: 13 http://mirrors.wikimedia.org/debian/ sid/main libipc-run-perl all 0.94-1 [102 kB]
Get: 14 http://mirrors.wikimedia.org/debian/ sid/main libstring-shellquote-perl all 1.03-1.1 [13.4 kB]
Fetched 2417 kB in 0s (5279 kB/s)
E: Failed to fetch http://mirrors.wikimedia.org/debian/pool/main/g/groff/groff-base_1.22.3-1_amd64.deb: 404  Not Found [IP: 208.80.154.10 80]
E: Unable to correct for unavailable packages
[…]

Change 269095 had a related patch set uploaded (by Hashar):
package_builder: set HOOKDIR only when it exists

https://gerrit.wikimedia.org/r/269095

I am trying to update the cow image manually with:

jenkins-deploy@integration-slave-jessie-1001:~$
sudo DIST=unstable ARCH=amd64 cowbuilder --update --basepath /var/cache/pbuilder/base-unstable-amd64.cow/

It upgrades a bunch of packages and ends up with:

I: Copying back the cached apt archive contents
I: unmounting dev/pts filesystem
I: unmounting run/shm filesystem
I: unmounting proc filesystem
 -> removing cowbuilder working copy
 -> Moving work directory [/var/cache/pbuilder/build//cow.12232] to final location [/mnt/pbuilder/base-sid-amd64.cow] and cleaning up old copy
  forking: rm -rf /var/cache/pbuilder/build//cow.12232-12232-tmp

Note how there is no message rename: Not a directory and the final location starts with /mnt/pbuilder/ whereas in the Jenkins console that is /var/cache/pbuilder (which is a symlink).

Looking at the image it seems to have been updated properly:

 ls -l /mnt/pbuilder/base-sid-amd64.cow/var/log/apt/history.log
-rw-r--r-- 1 root root 12481 Feb  8 10:42 /mnt/pbuilder/base-sid-amd64.cow/var/log/apt/history.log

(Previously that file was from September 20th 2015)

So potentially cowdancer/cowbuilder is confused by the symlink. Looks like we would need to set some env variable.


I rechecked the last merged change https://gerrit.wikimedia.org/r/#/c/267632/ but it still fails https://integration.wikimedia.org/ci/job/debian-glue/92/consoleFull :(

Change 269103 had a related patch set uploaded (by Hashar):
contint: set pbuilder basepath to actual directory

https://gerrit.wikimedia.org/r/269103

I tried tweaking the $basepath in puppet, but that is not the issue actually though we should still land the change in.

I believe the root cause would be that the unstable cow image is a symlink to the sid one:

/mnt/pbuilder/base-unstable-amd64.cow -> /mnt/pbuilder/base-sid-amd64.cow

So when cowbuilder --update attempts to refresh the unstable image it bails out because it is a symbolic link ...

Something I don't quite understand yet is that the base-unstable-amd64.cow is a symlink to base-sid-amd64.cow and that image is updated via cronjob:

# Puppet Name: cowbuilder_update_sid-amd64
34 7 * * * /usr/sbin/cowbuilder --update                     --basepath "/mnt/pbuilder/base-sid-amd64.cow"                     >/dev/null 2>&1

But maybe that only updates the sid ones and when the image is used with DIST=unstable the components are still obsolete/stalled (since only sid is updated).

scfc claimed this task.

Now the package installation works (https://integration.wikimedia.org/ci/job/debian-glue/97/console; the remaining error is not related to this task). So maybe just a fluke?! I'll reopen this task if the issue reoccurs.

scfc removed scfc as the assignee of this task.Feb 9 2016, 12:08 PM
scfc set Security to None.

I have manually updated the image somehow.

Will have to ask around but we most probably want to have both sid and unstable images that would get rid of the symlink that prevent the updated image to replace the symlink (since the rename does not accept to rename a directory over a symlink).

The package_builder manifest creates a sid one but most packages seems to use unstable in their changelog ...

Taking a look at this I would say that it has nothing to do with the symlink mentioned above, but rathe an inability of the cronjob to work as is. I paste the cron output

 -> Copying COW directory
  forking: rm -rf /var/cache/pbuilder/build//cow.13519
  forking: cp -al /var/cache/pbuilder/base-trusty-amd64.cow /var/cache/pbuilder/build//cow.13519
I: removed stale ilistfile /var/cache/pbuilder/build//cow.13519/.ilist
 -> Invoking pbuilder
  forking: pbuilder update --buildplace /var/cache/pbuilder/build//cow.13519 --no-targz --internal-chrootexec chroot /var/cache/pbuilder/build//cow.13519 cow-shell
execvp: No such file or directory
Could not execute pbuilder
pbuilder update failed
E: could not update with cowdancer, try --no-cowdancer-update option
  forking: rm -rf /var/cache/pbuilder/build//cow.13519

funnily enough, when run from a terminal the command works just fine. Also --no-cowdancer-update does NOT work despite being proposed as an alternative in the above message.

I am investigating this more. Seems like it bites us at the production level as well

Change 269441 had a related patch set uploaded (by Alexandros Kosiaris):
package_builder: Set PATH for cron updates

https://gerrit.wikimedia.org/r/269441

Change 269441 merged by Alexandros Kosiaris:
package_builder: Set PATH for cron updates

https://gerrit.wikimedia.org/r/269441

Tested and the above patch definitely solves the production problem with outdated build environments. Turns out the cron approach never really worked up to now. After cron runs and those envs get updated, can we retry this build ?

Change 269095 abandoned by Hashar:
package_builder: set HOOKDIR only when it exists

Reason:
Skipping HOOKDIR was merely to discard a potential cause for T125999.

Fully agree about DRY and pbuilder definitely skip missing hooks.

https://gerrit.wikimedia.org/r/269095

hashar claimed this task.

Yup looks fine now.

Change 269103 merged by Alexandros Kosiaris:
contint: set pbuilder basepath to actual directory

https://gerrit.wikimedia.org/r/269103