Page MenuHomePhabricator

[Discuss] Split ORES scores in datacenters based on wiki
Closed, ResolvedPublic

Description

When we want to go active/active mode. We duplicate preaching in both datacenters and randomly split requests between the datacenters. My suggestion is that in varnish and/or LVS, we make a setting that (for example) if URL of ores request matches "enwiki", "wikidatawiki" or "nlwiki".[1] The request should go to eqiad otherwise, send it to codfw. Like enwiki, wikidatawiki, and nlwiki belong to s1 shard and the others to s2 shard

In that case, we don't need duplication of precaching and we can have ChangeProp only does precaching of the proper wikis in the proper datacenter. That means real double capacity.

In ideal world, we might be able to load only proper wikis in ores nodes which would take less memory and we can increase the number of workers (i.e. quadruple capacity) but due to failover reasons, I think we should not let ores handle this (so once one datacenter goes down, simply the other one can handle all types of requests)

[1]: The exact wikis should be determined

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Yes, my only doubt with this proposal is exactly that we want to be active/active but to being able to serve all the traffic from a single datacenter.

So no sharding should be done cross-dc, but only intra-dc.

Also, I think you are hoping to do some magic at the traffic layer that's not really feasible; but a clearer plan might be welcome.

To re-iterate what @Joe is saying a little differently: the point of cross-dc active/active (which is a goal for all services) is to have the ability at any moment in time to handle all traffic in just one DC because we've suddenly lost or depooled the other. We're not sharding data cross-DC, or distributing maximum load capacity cross-DC.

Halfak renamed this task from Split ORES scores in datacenters based on wiki to [Discuss] Split ORES scores in datacenters based on wiki.May 11 2017, 2:37 PM
Halfak triaged this task as Low priority.
Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.
Ladsgroup claimed this task.

I think it's clear that we should not do this thus closing it as resolved.

akosiaris changed the task status from Resolved to Declined.Apr 18 2018, 2:00 PM

Declined actually.

akosiaris changed the task status from Declined to Resolved.Apr 18 2018, 2:06 PM

Since this was a [Discuss] task, resolved was conceptually correct.