Page MenuHomePhabricator

mcrouter proxies and scap proxies
Open, MediumPublic

Description

In each datacenter we have 4 mw* servers which act as mcrouter proxies, and 4 more that act as scap proxies. I was wondering if it makes sense to use the same set of 4 servers for both, in order to have 4 snowflakes in each DC instead of 8. Further more, we could consider if it makes sense to have dedicated VMs to specifically serve this roles and decouple them from mediawiki roles.

The problem I see (and have came across), is that we, generally, consider mw* servers as expendables, but there are those 8 servers that are we should keep in mind when e.g. reimaging or decommissioning.

Thoughts?

Event Timeline

jijiki created this task.Feb 21 2020, 4:20 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 21 2020, 4:20 PM
jijiki updated the task description. (Show Details)Feb 21 2020, 4:21 PM
jbond triaged this task as Medium priority.Feb 24 2020, 1:11 PM

If there are no objections, I would like to proceed with this

Joe closed this task as Declined.Feb 26 2020, 9:41 AM

No please. I didn't notice this task.

Scap proxies are under a ton of pressure during train releases, ti was a deliberate choice not to mix the two.

We should rather work so that we automate adding/removing proxies when needed.

jijiki reopened this task as Open.Feb 26 2020, 1:33 PM

(Reopening to discuss this a bit more)

I understand your point, automating would be great, but it will not solve the problem where we have about 100+ mw servers, of which 16 are snowflakes. For instance, one of the scap proxies freezes, no one who is around knows/notices that this server is special, and then someone deploys. The deploy will fail because the scap proxy is down. Is virtualising an option we could consider? Having specific roles or hostnames for those tasks would help I think.

Joe added a comment.Feb 26 2020, 2:08 PM

(Reopening to discuss this a bit more)

I understand your point, automating would be great, but it will not solve the problem where we have about 100+ mw servers, of which 16 are snowflakes. For instance, one of the scap proxies freezes, no one who is around knows/notices that this server is special, and then someone deploys. The deploy will fail because the scap proxy is down. Is virtualising an option we could consider? Having specific roles or hostnames for those tasks would help I think.

What would having all scap proxies also be mcrouter proxies change in terms of the scenario you described above?

Also:
during large deployments, the scap proxies nearly exhaust their bandwidth. In the past they even became unresponsive at times, and we had a scap step for depooling them. Doesn't seem like a good place where to base the mcrouter proxies.

What would having all scap proxies also be mcrouter proxies change in terms of the scenario you described above?

This would be just a way to have 8 snowflakes instead of 16, it is quite clear that it wouldn't help at all in the above scenario. Which brings me to an alternative, what if we had different naming sceme for those servers/roles, like mw-sp1001 or mw-mp1001 etc? That would provide some level of telling them apart from the rest of the cluster.

Joe added a comment.Mar 2 2020, 10:39 AM

What would having all scap proxies also be mcrouter proxies change in terms of the scenario you described above?

This would be just a way to have 8 snowflakes instead of 16, it is quite clear that it wouldn't help at all in the above scenario. Which brings me to an alternative, what if we had different naming sceme for those servers/roles, like mw-sp1001 or mw-mp1001 etc? That would provide some level of telling them apart from the rest of the cluster.

It would be incredibly annoying in other ways, say when you want to swipe things across the cluster with cumin. Also I would like to be free to change which servers have these roles easily.

As I said, I'd prefer to have automation rather than these stratagems.

jijiki moved this task from Incoming 🐫 to Unsorted on the serviceops board.Aug 17 2020, 11:46 PM