Instances created by Nodepool cant run puppet due to missing certificate
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	hashar
	Apr 21 2015, 11:48 AM

Description

When Nodepool spawns a Trusty image, the instance stall on boot while doing the puppet certificate work:

Info: Creating a new SSL key for i-00000b80.eqiad.wmflabs
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for i-00000b80.eqiad.wmflabs
Info: Certificate Request fingerprint (SHA256): 24:DD:BF:BC:FA:B4:42:BC:E5:3D:58:F5:15:9F:51:D1:DE:7B:11:E1:F1:99:6A:D8:3A:CB:6F:B2:D4:E0:F6:BB
Info: Caching certificate for ca
Notice: Did not receive certificate
Notice: Did not receive certificate
Notice: Did not receive certificate

On Wikitech, the Nova_Resource page shows that it is missing a few fields injected by OpenStackManager. Compared to the an instance manually created, the instance spawned by Nodepool one is missing the fields:

Resource type
    instance
Image Id
    ubuntu-14.04-trusty
FQDN
    i-00000b7b.eqiad.wmflabs
Puppet Class
    base, role::labs::instance
Puppet Var
    realm=labs, use_dnsmasq=true, instanceproject=contintcloud, instancename=trusty-manual

Looking at firstboot.sh, the script does a LDAP search to grab puppet vars:

$ ldapsearch -x -D 'cn=proxyagent,ou=profile,dc=wikimedia,dc=org' -w '###########' -b 'ou=hosts,dc=wikimedia,dc=org' 'dc=i-00000b7b.eqiad.wmflabs'|grep puppetVar
puppetVar: realm=labs
puppetVar: use_dnsmasq=true
puppetVar: instanceproject=contintcloud
puppetVar: instancename=trusty-manual

The same query yields nothing for the NodePool instance since that is injected by OpenStack manager.

Details

	Subject	Repo	Branch	Lines +/-
	labs: support injecting tenant in firstboot.sh	operations/puppet	production	+28 -2

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Andrew	T42525 Cant add a security group to an existing instance
Resolved	Andrew	T87279 Make OpenStack Horizon useful for production labs
Resolved	hashar	T96670 Instances created by Nodepool cant run puppet due to missing certificate
Resolved	Andrew	T96677 Move ldap host-record creation out of OpenStackManager and into sink
Resolved	Andrew	T97163 Support instance creation/deletion via nova commandline
Resolved	Andrew	T91987 Nova Instance creation hook for ldap
Duplicate	Andrew	T95910 Abolish use of ec2 ids
Resolved	Andrew	T95911 Remove puppet and salt keys on instance deletion
Resolved	Andrew	T95480 Abolish use of ec2id in cert names
Resolved	ArielGlenn	T95481 Fix monitor_labs_salt_keys.py to handle the new labs naming scheme
Resolved	Andrew	T95519 Automatically clean salt and puppet certs on instance deletion
Resolved	Andrew	T97170 Switch all of labs to the pdns server, remove the use_dnsmasq flag.
Resolved	coren	T95288 Designate should support split horizon resolution to yield private IP of instances behind a public DNS entry
Resolved	Andrew	T99133 New server for labs dns recursor
Resolved	Andrew	T101871 remove per-host sudo control from OSM
Resolved	Andrew	T102047 Rename all labs ldap host entries to use fqdn
Resolved	Andrew	T102834 instance status pages broken
Resolved	Andrew	T102839 Build new images that don't do an ec2 id lookup in ldap

Event Timeline

hashar created this task.Apr 21 2015, 11:48 AM

hashar raised the priority of this task from to Medium.

hashar updated the task description. (Show Details)

hashar added a project: Continuous-Integration-Scaling.

hashar subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 21 2015, 11:48 AM

+ @Andrew

NodePool has support to store key/values in OpenStack metadata service from http://ci.openstack.org/nodepool/configuration.html?highlight=region#images

images:
  - name: precise
    base-image: 'Precise'
    min-ram: 8192
    name-filter: 'm1.medium'
    setup: prepare_node.sh
    reset: reset_node.sh
    username: jenkins
    private-key: /var/lib/jenkins/.ssh/id_rsa
    vvvvvvvvvvvvvvv
    meta:
        key: value
        key2: value
    ^^^^^^^^^^^^^^^

I gave it a try but apparently the meta informations are not published, or at least they do not show up in the Horizon dashboard.

Anyway, the PuppetVar and PuppetClasses are populated by OpenStackManager and I don't think it is a good idea to duplicate them in NodePool configuration file. Instead I am wondering whether we could have OpenStackManager publish the metadata information and adjust firstboot.sh to fetch them via the http://169.254.169.254/openstack/latest/meta_data.json interface. Tough it yields json :(

From our wikitech.php we have:

$wgOpenStackManagerPuppetOptions = array(
    'enabled' => true,
    'defaultclasses' => array( 'base', 'role::labs::instance' ),
    'defaultvariables' => array( 'realm' => 'labs', 'use_dnsmasq' => 'true' ),
);

I'm in the process of moving the ldap record creation out of openstackmanager and into a designate callback -- after that's done instances created on the commandline will work the same as those created by the gui. Many of the pieces of that are already in place; I'll try to pick up the pace a bit.

Andrew added a subtask: T96677: Move ldap host-record creation out of OpenStackManager and into sink.Apr 21 2015, 1:45 PM

hashar added a subtask: T97163: Support instance creation/deletion via nova commandline.Apr 24 2015, 8:22 PM

hashar moved this task from Backlog to In-progress on the Continuous-Integration-Scaling board.Apr 27 2015, 8:01 AM

For now, only the CI isolation project is going to spawn instances directly via the OpenStack API. In firstboot.sh, the only line relying on puppetVar is:

project=`ldapsearch -x -D ${binddn} -w ${bindpw} -b ${hostsou} "dc=${idfqdn}" puppetvar | grep 'instanceproject' | sed 's/.*=//'`

Which is merely fetch from ldap the puppetvar fields to get the labs project name. For CI isolation we are going to use custom images. We could teach firstboot.sh to source a file that contains environment variables and hardcode the project to contintcloud. Something like:

firstboot.env

WMFLABS_INSTANCE_PROJECT="contintcloud

firstboot.sh

[ -f firstboot.env ] && . firstboot.env

project_from_ldap=`ldapsearch -x -D ${binddn} -w ${bindpw} -b ${hostsou} "dc=${idfqdn}" puppetvar | grep 'instanceproject' | sed 's/.*=//'`
project={WMFLABS_INSTANCE_PROJECT:-$project_from_ldap}

This way we can override the ldapsearch in the CI custom images.

Change 210032 had a related patch set uploaded (by Hashar):
labs: support injecting tenant in firstboot.sh

https://gerrit.wikimedia.org/r/210032

gerritbot added a project: Patch-For-Review.Jun 11 2015, 3:27 PM

That task is the reason instances booted on labs by nodepool do not manage to first boot.

Andrew closed subtask T97163: Support instance creation/deletion via nova commandline as Resolved.Jun 18 2015, 2:36 PM

Change 210032 abandoned by Hashar:
labs: support injecting tenant in firstboot.sh

Reason:
Probably not needed anymore

https://gerrit.wikimedia.org/r/210032

@Andrew fixed labs so we can boot instances from the OpenStack API and I confirmed it works fine. We can also spawn them from the Horizon dashboard!

hashar moved this task from In-progress to Done on the Continuous-Integration-Scaling board.Sep 11 2015, 9:10 PM

greg added a project: Essential-Work.Sep 21 2015, 8:53 PM

Andrew closed subtask T96677: Move ldap host-record creation out of OpenStackManager and into sink as Resolved.Dec 3 2015, 5:48 AM

Instances created by Nodepool cant run puppet due to missing certificateClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Instances created by Nodepool cant run puppet due to missing certificate
Closed, ResolvedPublic
Actions

Related Objects
Search...