Page MenuHomePhabricator

Cumin leading zeros in host grouping alter hostname
Open, NormalPublic

Description

deployment-logstash2 is possibly the oldest instance in the deployment-prep labs project, and its named a little differently. In T218729 we're replacing it with one with 03 instead of 2. In the process I seem to have stumbled into a cumin bug.

krenair@deployment-cumin02:~$ sudo cumin 'name:logstash2' id
1 hosts will be targeted:
deployment-logstash2.deployment-prep.eqiad.wmflabs
Confirm to continue [y/n]? n
Execution interrupted by Ctrl+c/SIGINT/Aborted
krenair@deployment-cumin02:~$ sudo cumin 'name:logstash03' id
1 hosts will be targeted:
deployment-logstash03.deployment-prep.eqiad.wmflabs
Confirm to continue [y/n]? n
Execution interrupted by Ctrl+c/SIGINT/Aborted

Looking good so far.
But then if you try both:

krenair@deployment-cumin02:~$ sudo cumin 'name:logstash' id
2 hosts will be targeted:
deployment-logstash[02-03].deployment-prep.eqiad.wmflabs

Uh oh, why does it think there's a 0 in deployment-logstash2's name? It fails predictably:

Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                                                                                                                      
(1) deployment-logstash03.deployment-prep.eqiad.wmflabs                                                                                                                                                     
----- OUTPUT of 'id' -----                                                                                                                                                                                  
uid=0(root) gid=0(root) groups=0(root)                                                                                                                                                                      
===== NODE GROUP =====                                                                                                                                                                                      
(1) deployment-logstash02.deployment-prep.eqiad.wmflabs                                                                                                                                                     
----- OUTPUT of 'id' -----                                                                                                                                                                                  
ssh: Could not resolve hostname deployment-logstash02.deployment-prep.eqiad.wmflabs: Name or service not known

It does the same thing when run against * instead of name:logstash, and also does the same thing with the PuppetDB backend instead of the OpenStack backend:

krenair@deployment-cumin02:~$ sudo cumin 'P{*logstash*}' id
/usr/lib/python3/dist-packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for deployment-puppetdb02.deployment-prep.eqiad.wmflabs has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
2 hosts will be targeted:
deployment-logstash[02-03].deployment-prep.eqiad.wmflabs
Confirm to continue [y/n]? n
Execution interrupted by Ctrl+c/SIGINT/Aborted

Event Timeline

Krenair created this task.May 3 2019, 8:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2019, 8:44 PM
Volans triaged this task as Normal priority.May 3 2019, 11:21 PM

Indeed, that's weird and not expected. It seems a bug in ClusterShell's NodeSet for what I can tell, I've opened a bug upstream: https://github.com/cea-hpc/clustershell/issues/404

It's actually a currently documented behaviour, see https://clustershell.readthedocs.io/en/latest/tools/nodeset.html#zero-padding that it's aimed to be fixed for the 1.9 release at this time, but I wouldn't be too surprised if it will not make it for the 1.9.
See the pre-existing discussion in https://github.com/cea-hpc/clustershell/issues/293.

I'll see if we can add some check on Cumin's side for now without too much overhead.

Volans added a comment.May 4 2019, 1:39 PM

I've looked a bit into it but I don't think we can hook something into Cumin for the general case.
The most obvious validation we could add is to check that the list of hosts that comes from PuppetDB or OpenStack is the same after the NodeSet conversion, but while this seems an easy check to add into those two grammars, it immediately fails when considering the global grammar and the direct and knownhosts backends, that use NodeSet intrinsically and would not have a list of hosts to validate against in the first place.
This means that even adding a check only in the PuppetDB and OpenStack backends would not prevent the issue if performing a query of the type P{...} or P{...} in which the first query returns say a1.example.com and the second returns a02.example.com.

Given the above I'm leaning to leave all as is on the Cumin side and just take advantage of the fix once it will be available upstream.
Thoughts?

Volans renamed this task from Cumin can't handle targetting deployment-logstash2.deployment-prep.eqiad.wmflabs alongside other deployment-logstash hosts to Cumin leading zeros in host grouping alter hostname.May 4 2019, 1:41 PM

Given the above I'm leaning to leave all as is on the Cumin side and just take advantage of the fix once it will be available upstream.
Thoughts?

Sounds like a plan, it's not particularly urgent or anything.

Krenair moved this task from Backlog to Reported Upstream on the Upstream board.
hashar added a subscriber: hashar.May 16 2019, 9:50 PM

Thank you @Krenair to have pointed me at this task :-]

Does not seem a fix is ongoing uptream, but there is a recent comment pointing at a potentially faulty code: https://github.com/cea-hpc/clustershell/issues/293#issuecomment-492898000