Page MenuHomePhabricator

beta-scap-eqiad FAILED - deployment-snapshot01.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed.
Closed, ResolvedPublic

Description

It looks like the error / warning is causing the job to be marked as failing.
It looks like the scap still succeeds though.

https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/189280/console

08:16:34 08:16:34 Started sync_wikiversions
08:16:34 08:16:34 Compiled /srv/mediawiki-staging/wikiversions-labs.json to /srv/mediawiki-staging/wikiversions-labs.php
08:16:34 sync_wikiversions:   0% (ok: 0; fail: 0; left: 9)                               
08:16:35 08:16:35 sudo -u mwdeploy -n -- /usr/bin/rsync -l deployment-tin.deployment-prep.eqiad.wmflabs::common/wikiversions*.{json,php} /srv/mediawiki on deployment-snapshot01.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed.
08:16:35 
08:16:35 sync_wikiversions:  11% (ok: 0; fail: 1; left: 8)                               
08:16:41 sync_wikiversions:  22% (ok: 1; fail: 1; left: 7)                               
08:16:42 sync_wikiversions:  33% (ok: 2; fail: 1; left: 6)                               
08:16:42 sync_wikiversions:  44% (ok: 3; fail: 1; left: 5)                               
08:16:43 sync_wikiversions:  55% (ok: 4; fail: 1; left: 4)                               
08:16:43 sync_wikiversions:  66% (ok: 5; fail: 1; left: 3)                               
08:16:43 sync_wikiversions:  77% (ok: 6; fail: 1; left: 2)                               
08:16:43 sync_wikiversions:  88% (ok: 7; fail: 1; left: 1)                               
08:16:44 sync_wikiversions: 100% (ok: 8; fail: 1; left: 0)                               
08:16:44 sync_wikiversions: 100% (ok: 8; fail: 1; left: 0)                               
08:16:44 
08:16:44 08:16:44 Finished sync_wikiversions (duration: 00m 09s)
08:16:44 08:16:44 1 hosts had sync_wikiversions errors
08:16:44 08:16:44 Finished scap: beta-scap-eqiad (build #189280) (duration: 03m 02s)
08:16:44 Build step 'Execute shell' marked build as failure
08:16:44 IRC notifier plugin: Sending notification to: #wikimedia-releng
08:16:44 Email was triggered for: Failure - Any
08:16:44 Sending email for trigger: Failure - Any
08:16:44 Sending email to: qa-alerts@lists.wikimedia.org betacluster-alerts@lists.wikimedia.org
08:16:44 Finished: FAILURE

Event Timeline

There’s a

returned [255]: Host key verification failed.
08:16:35

I merged this; https://gerrit.wikimedia.org/r/#/c/400237/ this morning. Do I need to add anything else for scap of mw to work?

I merged this; https://gerrit.wikimedia.org/r/#/c/400237/ this morning. Do I need to add anything else for scap of mw to work?

I added the hostkey for deployment-snapshot01 to known_hosts on deployment-tin. IIRC this used to be done by sshknowngen but I guess this fell apart at some point (T159332).

Now the beta-scap-eqiad failure is that mwdeploy can't ssh from deployment-tin to this box.

You'll need to add role::beta::mediawiki as well as something that includes the mediawiki class. It looks like snapshot1001.eqiad.wmnet in prod includes role::dumps::generation::worker::testbed which leads to the mediawiki class eventually. You can apply that role in horizon; however, I notice the puppetmaster for this instance is set to deployment-dumps-puppetmaster.deployment-prep.eqiad.wmflabs rather than deployment-puppetmaster02.deployment-prep.eqiad.wmflabs. Unsure if the secrets module on deployment-puppetmaster02 has any cherry-picks that might lead to a different public key for mwdeploy which is what is missing.

tl;dr: Adding role::beta::mediawiki and role::dumps::generation::worker::testbed (if that's the role you need for this) should magically fix stuff.

I probably don't want the whole role:;beta::mediawiki but I might be able to be clever about a restructuring of the testbed role so it can grab the right extra beta classes as needed. Thanks!

role::beta::mediawiki turns out to have no actual mw manifests in it, I can just apply the whole thing.

BTW the reason for own puppetmaster is that this instance will be a testbed for moving to php7, with a bunch of crap we don't want on deployment-prep generally. I'll check into the keys issue.

thcipriani triaged this task as Medium priority.Jan 4 2018, 6:21 PM

for the time being I reverted the patch that added deployment-snapshot01 to mediawiki deployment targets in beta just so beta-scap-eqiad will stop being mad alerting. Whenever deployment-snapshot01 is all setup we can remove:

root@deployment-puppetmaster02:/var/lib/git/operations/puppet# git log -1                                                                                                
commit f962bff66e0a6a682bedb10e3d1fdeb603e07190                                                                                                                          
Author: Root <root@deployment-puppetmaster02.deployment-prep.eqiad.wmflabs>                                                                                              
Date:   Thu Jan 4 18:16:24 2018 +0000                                                                                                                                    
                                                                                                                                                                         
    Revert "add dumps repo source to beta scap, add snapshot to beta mw scap"                                                                                            
                                                                                                                                                                         
    This reverts commit 73aa10a7c94aa04914540c0b730015ec48a848ab.                                                                                                        
                                                                                                                                                                         
    thcipriani: this is temporary and should be removed when T184175 is                                                                                                  
    resolved.

from deployment-puppetmaster02.

That's fine. I've applied mediawiki::users ad it seems to be there so let's remove that patch now and let's see if the scap runs.

With some assist from @thcipriani we've got the mediawiki::scap and scap::ferm rules on there, which is sufficient for now. Scap runs properly, I can get to work on applying the snapshot manifests piece by piece, making sure they work properly in labs. Thanks!

thcipriani assigned this task to ArielGlenn.

beta-scap-eqiad seems happy. Calling this closed :)