Page MenuHomePhabricator

Instances created by Nodepool cant run puppet due to missing certificate
Closed, ResolvedPublic


When Nodepool spawns a Trusty image, the instance stall on boot while doing the puppet certificate work:

Info: Creating a new SSL key for i-00000b80.eqiad.wmflabs
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for i-00000b80.eqiad.wmflabs
Info: Certificate Request fingerprint (SHA256): 24:DD:BF:BC:FA:B4:42:BC:E5:3D:58:F5:15:9F:51:D1:DE:7B:11:E1:F1:99:6A:D8:3A:CB:6F:B2:D4:E0:F6:BB
Info: Caching certificate for ca
Notice: Did not receive certificate
Notice: Did not receive certificate
Notice: Did not receive certificate

On Wikitech, the Nova_Resource page shows that it is missing a few fields injected by OpenStackManager. Compared to the an instance manually created, the instance spawned by Nodepool one is missing the fields:

Resource type
Image Id
Puppet Class
    base, role::labs::instance
Puppet Var
    realm=labs, use_dnsmasq=true, instanceproject=contintcloud, instancename=trusty-manual

Looking at, the script does a LDAP search to grab puppet vars:

$ ldapsearch -x -D 'cn=proxyagent,ou=profile,dc=wikimedia,dc=org' -w '###########' -b 'ou=hosts,dc=wikimedia,dc=org' 'dc=i-00000b7b.eqiad.wmflabs'|grep puppetVar
puppetVar: realm=labs
puppetVar: use_dnsmasq=true
puppetVar: instanceproject=contintcloud
puppetVar: instancename=trusty-manual

The same query yields nothing for the NodePool instance since that is injected by OpenStack manager.

Event Timeline

hashar raised the priority of this task from to Medium.
hashar updated the task description. (Show Details)
hashar added a subscriber: hashar.

+ @Andrew

NodePool has support to store key/values in OpenStack metadata service from

  - name: precise
    base-image: 'Precise'
    min-ram: 8192
    name-filter: 'm1.medium'
    username: jenkins
    private-key: /var/lib/jenkins/.ssh/id_rsa
        key: value
        key2: value

I gave it a try but apparently the meta informations are not published, or at least they do not show up in the Horizon dashboard.

Anyway, the PuppetVar and PuppetClasses are populated by OpenStackManager and I don't think it is a good idea to duplicate them in NodePool configuration file. Instead I am wondering whether we could have OpenStackManager publish the metadata information and adjust to fetch them via the interface. Tough it yields json :(

From our wikitech.php we have:

$wgOpenStackManagerPuppetOptions = array(
    'enabled' => true,
    'defaultclasses' => array( 'base', 'role::labs::instance' ),
    'defaultvariables' => array( 'realm' => 'labs', 'use_dnsmasq' => 'true' ),

I'm in the process of moving the ldap record creation out of openstackmanager and into a designate callback -- after that's done instances created on the commandline will work the same as those created by the gui. Many of the pieces of that are already in place; I'll try to pick up the pace a bit.

For now, only the CI isolation project is going to spawn instances directly via the OpenStack API. In, the only line relying on puppetVar is:

project=`ldapsearch -x -D ${binddn} -w ${bindpw} -b ${hostsou} "dc=${idfqdn}" puppetvar | grep 'instanceproject' | sed 's/.*=//'`

Which is merely fetch from ldap the puppetvar fields to get the labs project name. For CI isolation we are going to use custom images. We could teach to source a file that contains environment variables and hardcode the project to contintcloud. Something like:



[ -f firstboot.env ] && . firstboot.env

project_from_ldap=`ldapsearch -x -D ${binddn} -w ${bindpw} -b ${hostsou} "dc=${idfqdn}" puppetvar | grep 'instanceproject' | sed 's/.*=//'`

This way we can override the ldapsearch in the CI custom images.

Change 210032 had a related patch set uploaded (by Hashar):
labs: support injecting tenant in

That task is the reason instances booted on labs by nodepool do not manage to first boot.

Change 210032 abandoned by Hashar:
labs: support injecting tenant in

Probably not needed anymore

hashar claimed this task.

@Andrew fixed labs so we can boot instances from the OpenStack API and I confirmed it works fine. We can also spawn them from the Horizon dashboard!