Page MenuHomePhabricator

Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch
Closed, ResolvedPublic

Description

Possibly related to jessie vs. stretch
Mostly it's just the normal package not found error for the nutcracker exporter, but take a look at this one from -redis06 trying to get the redis one:

Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install prometheus-redis-exporter' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  apt-file insserv libexporter-tiny-perl liblist-moreutils-perl libpcsclite1
  libregexp-assemble-perl python-gdbm startpar sysv-rc
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  prometheus-redis-exporter
0 upgraded, 1 newly installed, 0 to remove and 50 not upgraded.
Need to get 1359 kB of archives.
After this operation, 4849 kB of additional disk space will be used.
Err:1 http://apt.wikimedia.org/wikimedia stretch-wikimedia/main amd64 prometheus-redis-exporter amd64 0.12.2-1
  404  Not Found [IP: 208.80.154.22 80]
E: Failed to fetch http://apt.wikimedia.org/wikimedia/pool/main/p/prometheus-redis-exporter/prometheus-redis-exporter_0.12.2-1_amd64.deb  404  Not Found [IP: 208.80.154.22 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

Is something wrong on the apt.wikimedia.org side there?

Event Timeline

Krenair triaged this task as Medium priority.Jan 5 2018, 2:12 AM
Krenair created this task.

Maybe stretch is pointing to an old version where as Jessie is pointing to the new one?

Nope, it just plain doesn't exist:

alex@alex-laptop:~$ ssh deployment-mediawiki07
Linux deployment-mediawiki07 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64
Debian GNU/Linux 9.2 (stretch)
deployment-mediawiki07 is mediawiki::appserver
The last Puppet run was at Fri Jan  5 20:32:39 UTC 2018 (13 minutes ago). 
Last login: Fri Jan  5 01:50:07 2018 from 10.68.18.65
krenair@deployment-mediawiki07:~$ apt-cache policy prometheus-nutcracker-exporter
N: Unable to locate package prometheus-nutcracker-exporter

Whereas:

alex@alex-laptop:~$ ssh deployment-mediawiki06
Linux deployment-mediawiki06 4.9.0-0.bpo.3-amd64 #1 SMP Debian 4.9.25-1~bpo8+3 (2017-06-15) x86_64
Debian GNU/Linux 8.8 (jessie)
deployment-mediawiki06 is mediawiki::appserver
The last Puppet run was at Fri Jan  5 20:17:19 UTC 2018 (28 minutes ago). 
Last login: Fri Jan  5 20:45:58 2018 from bastion-02.bastion.eqiad.wmflabs
krenair@deployment-mediawiki06:~$ apt-cache policy prometheus-nutcracker-exporter
prometheus-nutcracker-exporter:
  Installed: 0.2
  Candidate: 0.2
  Version table:
 *** 0.2 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/main amd64 Packages
        100 /var/lib/dpkg/status

Dug into this a bit more with some help from paladox and mutante, it seems the problem is apt-get update failing due to errors relating to including a stretch-wikimedia experimental sources entry, which seems broken on the apt.wikimedia.org end

It's due to the experimental component missing from InRelease file in stretch. It seems the folder exists for it but it was never added to that file. It does exist in the jessie one though.

jessie:

Components: main backports thirdparty experimental thirdparty/cloudera component/ci thirdparty/ci component/elastic55 thirdparty/elastic55 component/icu57 component/git

stretch:

Components: main thirdparty/cloudera thirdparty/ci thirdparty/confluent thirdparty/hwraid thirdparty/k8s component/elastic55 thirdparty/elastic55

Change 402431 had a related patch set (by Paladox) published:
[operations/puppet@production] aptrepo: Add experimental to stretch (distributions-wikimedia)

https://gerrit.wikimedia.org/r/402431

Change 402431 abandoned by Paladox:
aptrepo: Add experimental to stretch (distributions-wikimedia)

https://gerrit.wikimedia.org/r/402431

Change 402432 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] apt: Do not use experimental on stretch

https://gerrit.wikimedia.org/r/402432

Patch handles the errors, got some more ones on some of the hosts

Actually the remaining ones appear to be T153468

Change 402432 merged by ArielGlenn:
[operations/puppet@production] apt: Do not use experimental on stretch

https://gerrit.wikimedia.org/r/402432

Solved by removing 'experimental' from Stretch apt configuration https://gerrit.wikimedia.org/r/402432

Can you also remove apt::use_experimental from the Hiera settings for deployment-prep? There's no point for deployment-prep to use "experimental" at this point.

Welp, it's merged, and the hosts in question no longer have errors since the cherry-pick earlier. Can we close?