Page MenuHomePhabricator

Instances created by Nodepool cant run puppet due to missing certificate
Closed, ResolvedPublic

Description

When Nodepool spawns a Trusty image, the instance stall on boot while doing the puppet certificate work:

Info: Creating a new SSL key for i-00000b80.eqiad.wmflabs
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for i-00000b80.eqiad.wmflabs
Info: Certificate Request fingerprint (SHA256): 24:DD:BF:BC:FA:B4:42:BC:E5:3D:58:F5:15:9F:51:D1:DE:7B:11:E1:F1:99:6A:D8:3A:CB:6F:B2:D4:E0:F6:BB
Info: Caching certificate for ca
Notice: Did not receive certificate
Notice: Did not receive certificate
Notice: Did not receive certificate

On Wikitech, the Nova_Resource page shows that it is missing a few fields injected by OpenStackManager. Compared to the an instance manually created, the instance spawned by Nodepool one is missing the fields:

Resource type
    instance
Image Id
    ubuntu-14.04-trusty
FQDN
    i-00000b7b.eqiad.wmflabs
Puppet Class
    base, role::labs::instance
Puppet Var
    realm=labs, use_dnsmasq=true, instanceproject=contintcloud, instancename=trusty-manual

Looking at firstboot.sh, the script does a LDAP search to grab puppet vars:

$ ldapsearch -x -D 'cn=proxyagent,ou=profile,dc=wikimedia,dc=org' -w '###########' -b 'ou=hosts,dc=wikimedia,dc=org' 'dc=i-00000b7b.eqiad.wmflabs'|grep puppetVar
puppetVar: realm=labs
puppetVar: use_dnsmasq=true
puppetVar: instanceproject=contintcloud
puppetVar: instancename=trusty-manual

The same query yields nothing for the NodePool instance since that is injected by OpenStack manager.

Event Timeline

hashar raised the priority of this task from to Medium.
hashar updated the task description. (Show Details)
hashar subscribed.

+ @Andrew

NodePool has support to store key/values in OpenStack metadata service from http://ci.openstack.org/nodepool/configuration.html?highlight=region#images

images:
  - name: precise
    base-image: 'Precise'
    min-ram: 8192
    name-filter: 'm1.medium'
    setup: prepare_node.sh
    reset: reset_node.sh
    username: jenkins
    private-key: /var/lib/jenkins/.ssh/id_rsa
    vvvvvvvvvvvvvvv
    meta:
        key: value
        key2: value
    ^^^^^^^^^^^^^^^

I gave it a try but apparently the meta informations are not published, or at least they do not show up in the Horizon dashboard.

Anyway, the PuppetVar and PuppetClasses are populated by OpenStackManager and I don't think it is a good idea to duplicate them in NodePool configuration file. Instead I am wondering whether we could have OpenStackManager publish the metadata information and adjust firstboot.sh to fetch them via the http://169.254.169.254/openstack/latest/meta_data.json interface. Tough it yields json :(

From our wikitech.php we have:

$wgOpenStackManagerPuppetOptions = array(
    'enabled' => true,
    'defaultclasses' => array( 'base', 'role::labs::instance' ),
    'defaultvariables' => array( 'realm' => 'labs', 'use_dnsmasq' => 'true' ),
);

I'm in the process of moving the ldap record creation out of openstackmanager and into a designate callback -- after that's done instances created on the commandline will work the same as those created by the gui. Many of the pieces of that are already in place; I'll try to pick up the pace a bit.

For now, only the CI isolation project is going to spawn instances directly via the OpenStack API. In firstboot.sh, the only line relying on puppetVar is:

project=`ldapsearch -x -D ${binddn} -w ${bindpw} -b ${hostsou} "dc=${idfqdn}" puppetvar | grep 'instanceproject' | sed 's/.*=//'`

Which is merely fetch from ldap the puppetvar fields to get the labs project name. For CI isolation we are going to use custom images. We could teach firstboot.sh to source a file that contains environment variables and hardcode the project to contintcloud. Something like:

firstboot.env

WMFLABS_INSTANCE_PROJECT="contintcloud

firstboot.sh

[ -f firstboot.env ] && . firstboot.env

project_from_ldap=`ldapsearch -x -D ${binddn} -w ${bindpw} -b ${hostsou} "dc=${idfqdn}" puppetvar | grep 'instanceproject' | sed 's/.*=//'`
project={WMFLABS_INSTANCE_PROJECT:-$project_from_ldap}

This way we can override the ldapsearch in the CI custom images.

Change 210032 had a related patch set uploaded (by Hashar):
labs: support injecting tenant in firstboot.sh

https://gerrit.wikimedia.org/r/210032

That task is the reason instances booted on labs by nodepool do not manage to first boot.

Change 210032 abandoned by Hashar:
labs: support injecting tenant in firstboot.sh

Reason:
Probably not needed anymore

https://gerrit.wikimedia.org/r/210032

hashar claimed this task.

@Andrew fixed labs so we can boot instances from the OpenStack API and I confirmed it works fine. We can also spawn them from the Horizon dashboard!