Page MenuHomePhabricator

Cannot use dsh-based restart of parsoid from tin anymore
Closed, DeclinedPublic

Description

We used the deployment process as outlined in https://wikitech.wikimedia.org/wiki/Parsoid#Deploying_changes

One of the steps there is to using dsh on tin to restart the parsoid service on the wtp* servers. However, since those servers are now firewalled and don't accept ssh connections from tin, I couldn't restart the service myself. I got YuviPanda's help with the restart.
As described on that page, we also use dsh to verify that all nodes have the same version running.

Filing this bug to get this resolved in some fashion.

Event Timeline

ssastry created this task.Jan 28 2015, 9:38 PM
ssastry raised the priority of this task from to Needs Triage.
ssastry updated the task description. (Show Details)
ssastry added a project: acl*sre-team.
ssastry added subscribers: ssastry, yuvipanda.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 28 2015, 9:38 PM
Dzahn added a subscriber: Dzahn.Jan 28 2015, 9:41 PM

I thought ssh from all internal networks was still allowed in base::firewall, how come this even blocks ssh from tin?

Yeah, the firewall now does not accept connections from tin, but bast1001 is allowed. The dsh command works as documented on that wikitech page as long as it is run from tin. Perhaps updating the documentation is enough ?

@Dzahn, no, what would be the point of allowing all internal networks anyway ? We would not be achieving any kind of network separation.

@akosiaris, can we allow connections from tin, so that we can deploy from the deploy host?

Sure we can but isn't parsoid deployed via trebuchet which does not use SSH at all ?

GWicke added a comment.EditedJan 30 2015, 8:53 PM

@akosiaris, trebuchet (salt really) is not able to do rolling restarts reliably, so we are using dsh to actually apply the restart. See T63882.

Every time @ssastry has asked me to restart parsoid across the cluster I've always used salt (with a -b option) to do rolling restarts. What problems does it cause?

+1 to alex, this should be handled by the deployment setup itself.

GWicke added a comment.EditedJan 31 2015, 6:30 PM

@yuvipanda, see T63882. The problem is specific to using salt without root.

@GWicke, @ssastry, let's stall this for a while if you don't mind. As noted in T63882 the ability to add a timeout and hence fix the bug has been added in salt v2014.7.

Works for me. I'll continue to keep two shells open, one to bast1001 and another to tin and that should take care of it for deploys.

Andrew triaged this task as Normal priority.Feb 8 2015, 9:44 PM
Andrew set Security to None.

(a salt upgrade is in the works atm, thanks to @ArielGlenn)

yuvipanda closed this task as Declined.Mar 16 2015, 2:53 PM
yuvipanda claimed this task.

(Closing, since @ssastry seems to be using bastion for dsh atm, and we'll have a salt upgrade soon)