Page MenuHomePhabricator

CI: operations-dns-lint broken due to missing Maxmind DB file
Closed, ResolvedPublic

Description

The operations-dns-lint jenkins-bot check is broken since recently, so changes in the DNS repo get downvoted due to:

Failed to open GeoIP2 database '/usr/share/GeoIP/GeoIP2-City.mmdb': Error opening the specified MaxMind DB file

The puppet class "authdns::lint" is supposed to be installed on the slaves, it installs gdnsd and the maxmind dbs.

Related to the recent replacement of trusty?

example: https://integration.wikimedia.org/ci/job/operations-dns-lint/4457/console

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: hashar.

That is related. As I migrated some jobs from Trusty to Jessie, I have added a couple Jessie instances. That file is not provisioned by puppet and it is thus missing.

Good old slaves are 1001 and 1002:

integration-slave-jessie-1001.integration.eqiad.wmflabs
integration-slave-jessie-1002.integration.eqiad.wmflabs

Bad ones are 1003 and 1004:

integration-slave-jessie-1003.integration.eqiad.wmflabs
integration-slave-jessie-1004.integration.eqiad.wmflabs
$ ls -l /usr/share/GeoIP
total 62804
lrwxrwxrwx 1 root root       18 Apr  9  2015 GeoIP.dat -> GeoLiteCountry.dat
lrwxrwxrwx 1 root root       18 Apr  9  2015 GeoIP2-City.mmdb -> GeoLite2-City.mmdb
lrwxrwxrwx 1 root root       21 Apr  9  2015 GeoIP2-Country.mmdb -> GeoLite2-Country.mmdb
lrwxrwxrwx 1 root root       15 Apr  9  2015 GeoIPCity.dat -> GeoLiteCity.dat
-rw-r--r-- 1 root root 38297006 Nov  8  2015 GeoLite2-City.mmdb
-rw-r--r-- 1 root root  2079637 Nov  8  2015 GeoLite2-Country.mmdb
-rw-r--r-- 1 root root  3993422 Nov  8  2015 GeoLiteASNum.dat
-rw-r--r-- 1 root root 19126858 Nov  8  2015 GeoLiteCity.dat
-rw-r--r-- 1 root root   810094 Nov  8  2015 GeoLiteCountry.dat

I am trying to add the GeoIP files on the CI puppet master. Gotta fix some puppet madness with an undefined variable P6006 and https://gerrit.wikimedia.org/r/377986 puppetmaster: test for puppetmaster::geoip

Change 377986 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] puppetmaster: pass volatile_dir to geoip class

https://gerrit.wikimedia.org/r/377986

Mentioned in SAL (#wikimedia-releng) [2017-09-14T10:35:42Z] <hashar> CI puppet master: added class geoip::data::package and parameters: puppetmaster::geoip::fetch_private: false puppetmaster::geoip::use_proxy: false - T175864

I have rebuild the jenkins build and it passed on the slave 1003 ( https://integration.wikimedia.org/ci/job/operations-dns-lint/4463/console ).

Essentially solved. Just have to review the above puppet match :]

greg removed hashar as the assignee of this task.Sep 25 2017, 4:48 PM
greg subscribed.

Just waiting on puppet merge, unassigning antoine.

I would have merged https://gerrit.wikimedia.org/r/#/c/377986/ but it was blocked by a dependency on https://gerrit.wikimedia.org/r/#/c/377980/1 and _that_ one i can't review.

Change 377986 abandoned by Hashar:
puppetmaster: pass volatile_dir to geoip class

Reason:
Alexandros checked on the CI puppet master and I double checked: I can no more reproduce the issue I had.

I guess puppet was not willing to cooperate at some point.

Thank you for the double check!!!

https://gerrit.wikimedia.org/r/377986

hashar claimed this task.

Apparently that was transient or puppet was not willing to cooperate. Faidon / Alexandros verified my proposed patch and none of us could reproduce the issue. It is all fine now.