Page MenuHomePhabricator

Parallelize scap deployment of WDQS
Open, HighPublic5 Estimated Story Points

Description

We noticed we have been offshooting the deployment window of WDQS recently due to testing and also the non-parallel deployment of WDQS via scap. We don't want to restart more than 1 server at a time in a single cluster, to keep enough capacity to serve all the traffic. But we can restart servers from each cluster at the same time (public / internal & eqiad / codfw).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think we can parallelize, but we should do it in a smart way, so no more than one server in each cluster out of 3 is restared at the same time. But we can restart one in eqiad and one in codfw at the same time, same with internal and public. So we could do 4 servers at once instead of one.

Smalyshev triaged this task as Medium priority.Jan 3 2019, 1:23 AM

Removing assignee @Mathew.onipe as the user does not seem to be active anymore.

Gehel renamed this task from Increase deployment window of wdqs or parallelize scap deployment to Parallelize scap deployment of WDQS.Aug 11 2020, 7:25 PM
Gehel updated the task description. (Show Details)
Gehel raised the priority of this task from Medium to High.Aug 26 2021, 1:14 PM

This needs to be discussed with the rel-eng team before re-estimating and starting implementation.

Note that if adding support in Scap is too complex, it might make sense to implement deployment as cookbooks instead

I'll talk to rel-eng to see what scap changes are needed to parallelize between groups (wdqs eqiad public vs wdqs eqiad internal, etc)

There's a chance it might be worth it to rely on a cookbook to rolling restart. Basically we'd use scap to get the new code in place and a cookbook to do the actual rolling restarts to actually uptake the changes. But for now I'd assume we'll just be changing it in scap-land and not introducing a cookbook