Based in Nantes, France CET/CEST (UTC+1, UTC+2)
Main IRC channel is #wikimedia-releng
Apparently that was transient or puppet was not willing to cooperate. Faidon / Alexandros verified my proposed patch and none of us could reproduce the issue. It is all fine now.
So BACKPORTS was shallowed by sudo. I have adjusted the sudo policy in Horizon to env_keep =BACKPORTS. As a result the D02backports hook manage to inject the backports repository and runs apt-get update as expected.
I am not sure what is going on since the hook on the machine looks like:
I'm wondering if this is related:
I think @hashar issues a command that tried to purge all nodepool managed instances simultaneously as noted in https://phabricator.wikimedia.org/T170492#3579592 and at least rabbitmq choked but possibly nova being under seige was the root cause of that
@greg from a previous comment:
Note the public key is used on labs and IIRC access to production requires a different ssh key.
The instances provision fine using apt.wikimedia.org. The last two patches would let us resolve this task once for good.
The community metrics are provided with the help of Bitergia ( https://bitergia.com/ ).
I have filled this task for what it is: replace salt on integration and beta. There is no evilness intended!
Update: php-defaults / php-redis got build with php5.5 and uploaded to component/ci. Luasandbox is next.
So the slaves have docker-engine
$ apt-cache policy docker-engine docker-engine: Installed: 1.12.6-0~debian-jessie Candidate: 1.12.6-0~debian-jessie Version table: *** 1.12.6-0~debian-jessie 0 1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/thirdparty amd64 Packages 100 /var/lib/dpkg/status
contint1001:~$ apt-cache policy docker-ce docker-ce: Installed: 17.06.2~ce-0~debian Candidate: 17.06.2~ce-0~debian Version table: *** 17.06.2~ce-0~debian 0 1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/thirdparty/ci amd64 Packages 100 /var/lib/dpkg/status
contint1001:~$ docker --version Docker version 17.06.2-ce, build cec0b72
Was merely to copy paste from an etherpad.
The instances are:
@hashar I think this is unrelated with this task. My understanding is that those two WMCS projects have their own salt master internal to the project that they self-administer. Nothing forbid to keep it that way, and replacing the global WMCS salt master with cumin doesn't affect those in any way AFAIK.
I agree that it would be nice to have a simple way to install a cumin master inside a labs project, I can look into it once the goal-related work will be completed.
Feel free to open a dedicated task for it.
The instances are:
what does integration (CI) use salt for?
After running namespaceDupes:
id=2276 ns=0 dbk=Вікіслоўнік:Партал_супольнасці *** dest title exists and --add-prefix not specified id=1671 ns=0 dbk=Вікіслоўнік:Стварэнне_артыкулаў *** dest title exists and --add-prefix not specified id=3973 ns=0 dbk=Вікіслоўнік:Шаблон *** dest title exists and --add-prefix not specified 3 pages to fix, 0 were resolvable.
On behalf of @Ladsgroup , I have created https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/DataTypes owned by group wikidata.
contint-admins can not interact with the zuul service anymore since that now requires sudo/root.
The wikimedia-jessie base image is generated via bootstrap-vz with:
plugins: minimize_size: apt: autoclean: true languages: [none] gzip_indexes: true autoremove_suggests: true dpkg: locales:  exclude_docs: true
I have rebuild the jenkins build and it passed on the slave 1003 ( https://integration.wikimedia.org/ci/job/operations-dns-lint/4463/console ).
I am trying to add the GeoIP files on the CI puppet master. Gotta fix some puppet madness with an undefined variable P6006 and https://gerrit.wikimedia.org/r/377986 puppetmaster: test for puppetmaster::geoip
That is related. As I migrated some jobs from Trusty to Jessie, I have added a couple Jessie instances. That file is not provisioned by puppet and it is thus missing.
Moritz is rebuilding the Debian packages and will published them on apt.wikimedia.org . I am keeping this open until the upload has been completed. Then I can phase out the transient aptly repo :]
Should be good now:
Looks like it has been manually installed on one of the slave which is the one I used to verify whether jsduck is provisioned :(
Puppet cleaning is done via:
Almost every jobs are now running on Nodepool instances which do not suffer from this trouble.
It is not running on trusty.
php-compile-php55 is the last job still on Trusty.
https://gerrit.wikimedia.org/r/#/c/377469/2/modules/aptrepo/files/distributions-wikimedia adds the jessie-wikimedia component/ci :]
And I have deleted some left over jobs that had the ci-trusty-wikimedia label although they are not defined in JJB:
Still have to switch the jobs using the label phpflavor-php55
We no more have any jobs on nodepool Trusty instances (label: ci-trusty-wikimedia) \o/
And I have packaged php-luasandbox. The bulk of the work is done, what is left is maybe to polish up the packages then add them to apt.wikimedia.org. Meanwhile they are all published on a transient apt repo.
We have bunch of credentials in https://integration.wikimedia.org/ci/credentials/ , they can then be exported as environment variables on a per job basis with https://docs.openstack.org/infra/jenkins-job-builder/wrappers.html#wrappers.credentials-binding
In apt.wikimedia.org we have:
From this week-end logs, nova.network.manager had the same chain of logs:
2017-09-09 21:33:07.090 2408 WARNING nova.network.manager [req-2106f08d-04ea-4cfc-a344-d0f7abf1072c nodepoolmanager contintcloud - - -] Error cleaning up fixed ip allocation. Manual cleanup may be required. ValueError: Circular reference detected
2017-09-09 21:33:07.269 2408 ERROR oslo_messaging.rpc.dispatcher [req-2106f08d-04ea-4cfc-a344-d0f7abf1072c nodepoolmanager contintcloud - - -] Exception during message handling: Timed out waiting for a reply to message ID 6e59f41d21354eccbc77bfe0fcc1c5c0
Though the message apparently got yield some 7 seconds later:
2017-09-09 21:33:14.487 2408 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 6e59f41d21354eccbc77bfe0fcc1c5c0