Paste P6465

statsv multi DC discussion - 2017-12-12

Authored by Ottomata on Dec 14 2017, 3:25 PM.
1​ottomata| hey yall, is there a puppet way of knowing which is the current active primary DC?
2​ bblack| ottomata| there's no a single, global conception of that AFAIK in puppet. Only different switches for different things. Also, the general long-term trend is away from "X is the primary DC" in puppet, towards  "X is the primary DC" in etcd or even better (b) active-active.
3​ottomata| ya, bblack would prefer to do active active. i'm working on multiDC statsv (varnishkafka -> statsd), but because this is statsd
4​ottomata| and all DCs produce to a single active statsd host
5​ bblack| s/statsd/prometheus/ ? 
6​ottomata| i'm just moving clients off of analytics kafka clsuter, not building new ones! 
7​ottomata| hmm,i suppose i could regex extract the master statsd from the hiera statsd hostname...
8​ottomata| hmmmm
9​ bblack| in other areas, we're actively rewriting/replacing statsd stuff with prometheus stuff
10​ottomata| yeah, but this is statsv, which is statsd metrics from clients/browsers. perf team uses mostly i think
11​ bblack| but yeah, I guess in the meantime, you could roll your own switch
12​ottomata| so we'd have to rewrite how the clients emit the metrics i thinkg...or i suppose translate from statsd format to prometheus in our consumer
13​ bblack| (and maybe control the hiera statsd hostname effect from that switch, instead of having it separate)
14​ottomata| hmm
15​ottomata| godog| ^ thoughts on that?
16​ bblack| if it's PII, I don't know that we want it in ops prometheus
17​ bblack| I was thinking of ops statsd
18​ottomata| shouldn't be PII
19​ottomata| afaik
20​ bblack| ok
21​ godog| yeah I don't think so either re: PII
22​ godog| ottomata| reading
23​ godog| ottomata| so yeah extracting the statsd dc from the hostname in hiera isn't pretty
24​ottomata| nope
25​ottomata| godog| what about:
26​ottomata| statsd_active_datacenter: eqiad
28​# Main statsd instance
30​statsd: statsd.%{statsd_active_datacenter}.wmnet:8125
31​ godog| ottomata| I see, so e.g. producers in a datacenter that's not active would know they are not supposed to send data to statsd ?
32​ottomata| hmm, no godog, thinking of doing it like:
33​ottomata| varnishkafka producers know which statsd is active, and they only produce to kafka in that datacenter
34​ottomata| the statsv python consumer instances (which consume from kafka and produce to statsd), will run in both DCs, consuming from their local kafka clusters
35​ottomata| normally, there will be no data in statsv topic in codfw main kafka
36​ottomata| so the statsv consumer will be doing nothing there
37​ottomata| but, if you change active statsd instance
38​ottomata| puppet will reconfigure and bump varnishkafka, causing all of them to produce to the other DC statsv topic
39​ottomata| then the statsv python consumer will see messages there and start producing to statsd
40​ottomata| one q about that setup: should statsv in codfw always produce to active statsd (I think yes?) or to its local statsd? It seems like it won't matter since graphite is mirrored both ways, but since everybody else produces to active statsd independent of datacenter, probably statsv should do the same.
41​ godog| ottomata| got it, yeah it should always produce to the active statsd for simplicity
42​ godog| ottomata| on the topic (hah) of knobs in puppet needed during a switchover we're trying to reduce them, so perhaps the discovery dns records might be an alternative
43​ottomata| godog| i need to get them in puppet though, in order to render a template
44​ottomata| can I do that?
45​ottomata| oh wait, etcd/confd lets you do that right? hmmm
46​ottomata| wikitech looking...
47​ volans| yes please, no puppet commit for switchovers
48​ottomata| godog| how do you do this now? you change the dns IP for statsd.eqiad.wmnet?
49​ottomata| to something in codfw?
50​ godog| ottomata| that's the current sad state of affairs yes
51​ottomata| haha coudl use a litltle help...
52​ottomata| wow, so i coudln't even rely on that. crazy ok.
53​ottomata| coudln't rely on the hiera value to infer
54​ volans| ottomata| what do you need to do?
55​ottomata| volans| i need to know which statsd instance is active, in order to change a varnishkafka conf file to make it produce to a DC specific kafka cluster
56​ottomata| the kafka_config puppet function helps here
57​ottomata| i need to call
58​ottomata| $config = kafka_config('main', $statsd_active_datacenter)
59​ottomata| where $statsd_active_datacenter is either 'eqiad' or 'codfw'
60​ volans| I'm not sure I follow, why loading a different config based on statsd?
62​ottomata| trying to make this multi DC in the main kafka clsuter (so I can get it off of the analyics cluster)
63​ottomata| at any given time, there is only a single active statsd instance
64​ volans| yes I know
65​ottomata| but there is a kafka cluster in eqiad and in codfw
66​ottomata| so the varnishkafka producers that send this data to kafka, need to know which kafka cluster the data should be sent to, since the statsv python consumer (that produces to statsd) will receive messages in that datacenter
67​ godog| ottomata| I have to go as well, I've subscribed to tho
68​ottomata| k!
69​ottomata| is there a confd template example in puppet i can find somewhere...i'm looking...
70​ottomata| found some! searched for {{
72​ volans| so I think there are two approaches here, either use a discovery DNS and ensure that your tool does a proper refresh with TTL
73​ volans| or confd to dynamically change a config file if it's something in etcd
74​ volans| directly
75​ottomata| hmm, can't realy use discovery dns easily for this, what I need to discovery is the active DC name in order to lookup proper list of kafka brokers.
76​ottomata| i think godog said he changes a statsd discovery dns?
77​ottomata| which I could use to render the varnishkafka config file...but i'd still have to extract the dc name somehow...
78​ottomata| grr
79​ottomata| or no hm
80​ottomata| no
81​ottomata| i thin this won't work. the kafka broker names are stored in hiera. i can't look them up when confd renders the config file.
82​ottomata| oh but confd can render from zookeeper???hmmmmm....
83​ottomata| the broker names are in zookeeper.
84​ottomata| ergh this is a little crazy hmm.
85​ volans| ottomata| sorry, trying to fix another 2 things.. cannot dig into this now
86​ottomata| haha k
87​ottomata| HMMM OR
88​ottomata| another idea (i think i'm just typing to myself here)
89​ottomata| i can configure the vk producers to produce to a DC based on the cache::route_table
90​ottomata| that'll make stastv active/active, even if it will always produce to the single active statsd.
91​ottomata| bblack| is it ever possible for an entry in cache::route_table to not point at a primary DC (eqiad,codfw)?
92​ottomata| oh docs say yes
93​ottomata| "The edge sites (ulsfo and esams) should normally point at one of the primary sites (although it is possible to point an edge at another edge as well and route through it, but this would probably be a rare operational scenario).
94​ bblack| ottomata| routing is actually handled on a per-service basis now too, and the cache::route_table is used in conjunction with the per-service routing info...
95​ottomata| ?
96​ bblack| ottomata| so for an active/active service, there is no distinction between the two core sites, and traffic is split with no singular final destination. e.g. ulsfo+codfw frontend requests both exit codfw backend caches hitting codfw services, without touching eqiad. esams+eqiad requests to do similarly, staying on their side of the fench and don't touch codfw.
97​ bblack| but for a services which is active/passive and currently only enabled on the eqiad side, requests from ulsfo+codfw end up routing through the eqiad caches to reach the service in eqiad.
98​ bblack| and a different service might be the opposite: active in codfw and passive/offline in eqiad, in which case requests from esams+eqiad end up routing through the codfw caches to reach it.
99​ bblack| it's all based on the service, and one cache cluster can have several services configured differently, such that some requests go eqiad->codfw and some others go codfw->eqiad.
100​ottomata| ah, ok makes sense. i think in this case the active/active stuff is fine, as i'm kinda basing this on the text caches.
101​ottomata| since statsv varnishkafka runs on the text cacches
102​ bblack| huh?
104​ bblack| I was just answering the question about the cache routing, I don't know what you're talking about with linking it to how you're handling statsv
105​ottomata| statsv varnishkafka runs on text caches, and logs requests that match /beacon/statsv to kafka
106​ottomata| i'm trying to make statsv varnishkafka use the main kafka cluster in either primary DC
107​ottomata| rather than the analytics/jumbo cluster in only eqiad
108​ bblack| cache::route_table sort of info can't help you there, it's not for that.
109​ottomata| so, if text caches are configured to route requests based on cache::route_table, it should be safe to do so. for varnishkafka too. it doesn't actually matter which kafka cluster the varnishkafka produces to, i just need to pick one.
110​ottomata| if varnishakfka only ran in eqiad and codfw, it'd be simple
111​ottomata| i'd just pick the local one
112​ottomata| but, it also runs in non-primary DCs
113​ bblack| yes, so they need some switches for which to use
114​ bblack| which are not cache::route_table, which is for a different purpose
115​ottomata| oh?
116​ottomata| not for routing text cache requests to online (primary) DCs?
117​ bblack| not for deciding where statsv data should be sent 
118​ottomata| well, statsv data comes from varnish
119​ bblack| cache::route_table and the per-service routing can be modified in custom ways in a lot of different operational scenarios, some of which might have nothing to do with the statsv destination's concerns
120​ottomata| so if varnish is routed away from a primary DC, it seems safe to assume that varnishkafka shoudl be routed away from a DC
121​ bblack| in any case, I think I missed part of the conversation somewhere above, I thought the statsd destination wasn't even A/A, looking backwards now...
122​ottomata| it isn't but statsv is 
123​ bblack| "safe to assume" is the problem
124​ottomata| or, i'm trying to make it so
125​ottomata| well, safe enough, since it won't really matter because statsD is not active active...the statsv python consumer will be running in each primary DC...and producing to only the active statsD discovery address
126​ottomata| so, its more about multiDC active/active kafka & daemon producing to statsd.
127​ottomata| if you read (too much) scroll back, you'd see i was tryign to choose the kafka cluster based on the value of the statsd master...buuuut that was getting insane since it is done via confd discovery dns
128​ottomata| and the kafka configs i need are in hiera
129​ volans| chasemp: it reached the patched code and seems that it passed it, at least for now
130​ bblack| discovery DNS sounds like the right solution here
131​ottomata| yeah, but kafka isn't in discovery DNS, and that's the value i need
132​ bblack| you mean statsv?
133​ottomata| no i need to configure varnishkafka
134​ottomata| to produce to kafka
135​ bblack| yes
136​ottomata| so i need the list of kafka brokers
137​ bblack| and you put the discovery dns hostname in varnishkafka's config, identically at all sites, and discovery DNS does the rest
138​ottomata| which is in hiera
139​ottomata| so, maybe, especially since...i can use confd with zookeeper, right?
140​ bblack| you've switched topics from statsv to kafka brokers, or?
141​ottomata| the broker names are in zookeeper
142​ottomata| statsv uses kafka
143​ bblack| what is "statsv"?
144​ottomata| its /beacon/statsv varnishkafka -> kafka topic 'statsv' -> python consumer stastv -> statsd
146​ bblack| no, I meant more-specifically in statements about
147​ bblack| s/about/above/
148​ bblack| I thought we were talking about stasv the python daemon
149​ottomata| statsv is really just a python kafka consumer that produces messages from kafka to statsd
150​ottomata| ah
151​ottomata| yeah, i want the 'statsv service' (which includes the http requests, kafka backend, and statsv consumer process) to be multi DC
152​ottomata| so, bblack, configuring varnishkafka with confd might be possible, especially if i can look up values in the confd template from zookeeper
153​ottomata| buut, its just gonna get complicated
154​ volans| on the host, logging into with the install_console
155​ bblack| ottomata| I don't think anyone was suggesting integrating confd directly with vk
156​ottomata| ok...?
157​ bblack| ottomata| so, statsv is a daemon. it currently runs on hafnium as part of role::webperf. I'm guessing from site.pp there are plans for it and other things to now run on webperf[12]001 for the multi-dc part.
158​ottomata| yes
160​ bblack| ottomata| some other stuff (kafka brokers?) need to contact statsv, and the question is how to route requests to webperf[12]001 from kafka brokers in various DCs?
161​ottomata| the producer of the data (varnishkafka) is decoupled from the consumer of the data (statsv daemon)
162​ottomata| so, not quite.
163​ottomata| the question is how to route requests from varnishkafka when there are multiple kafka clusters to choose from
164​ bblack| ottomata| that's not a statsv/statsd question though, that's a vk->kafka-clusters question.
165​ottomata| bblack, we'd have this problem for all webrequest too, if for some reason we wanted to support multi DC analytics stuff, or if someone needed webrequest logs reliably in all DCs
166​ bblack| can we handle one question at a time? we seem to be ping-ponging all over the place
167​ottomata| bblack, ok.
168​ottomata| the statsv quetions is: how do we make it so statsv can run multi DC, but with an active/passive statsd?
169​ottomata| that is
170​ottomata| statsv needs to be multi DC, but we need to be sure that any given statsv message only is produced once (to the active statsd instance)
171​ottomata| (since the statsv metrics are mirrored between DCs by graphite)
172​ottomata| (statsd^*
173​ottomata| sorry i guess this is hard to type out, its a little complicated...i've been drawing crappy pictures today to keep it straight in my mind...
174​ottomata| bblack, is that question better?
175​ bblack| so, both ends of this pipeline, vk and statsv, both make their connections *to* a kafka cluster? nobody "connects to" statsv, then?
176​ bblack| and then statsv forwards to statsd by connecting to statsd?
177​ bblack| I'm just trying to figure out who is even connecting to what and where data is going now
178​ottomata| correct
179​ bblack| are the per-DC kafka clusters independent or do they also share data with each other?
180​ottomata| this is publish subscribe
181​ottomata| they are effectively independent. for active active pubsub like changeprop
182​ottomata| we prefix topics with dc name
183​ottomata| e.g.
184​ottomata| eqiad.mediawiki.revision-create, codfw.mediawiki.revision-create
185​ottomata| then
186​ bblack| so they do share data, you just use keys in some cases to make it distinct
187​ottomata| Kafka mirror maker mirrors data between the two
188​ottomata| so data from both DCs is availabel to consumers in the each DC
189​ottomata| but producers only produce to their local DCs
190​ottomata| that works because mediawiki, (etc.) only run in primary DCs
191​ bblack| and statsv is basically just a stateless translator that pulls from kafka and pushes to statsd?
192​ottomata| correct
193​ bblack| ok
194​ bblack| so first, not that it really affects the final solution much: is there a reason to run statsv independently on webperf[12]001, instead of just running (many more) instances of it directly on the kafka brokers?
195​ bblack| just thinking in terms of not introducing more network hops and more resiliency problems
196​ottomata| so, it woudnl't hurt
197​ottomata| but, 'statsv' topic is single partition
198​ottomata| so
199​ottomata| extra consumer processes would be idel
200​ bblack| ok
201​ottomata| but would take over if main one dies
202​ bblack| ignore that for now
203​ottomata| k :)
204​ bblack| the vk statsv *data*: does it have these per-DC key prefixes?
205​ottomata| no
206​ bblack| ok
207​ottomata| its bascially a limited webrequset log
208​ottomata| with statsd format like data in the query params
209​ottomata| e.g.
210​ottomata| "?PagePreviewsApiResponse=38ms&PagePreviewsApiResponse=35ms"
211​ bblack| but in any case, it's only going to send the data to <some kafka broker> once. the problem is on the subscriber side.
212​ottomata| correct
213​ bblack| if we have two statsv daemons in eqiad+codfw, and they each pull from their local <kafka cluster>, and they both forward to a single statsd host, we still have a problem because events will be duplicated, right?
214​ottomata| (effectively/mostly correct, there is an edge case here but it is irrelevant: kafka is at least once guaruntee)
215​ottomata| yes, since graphite mirrors the data
216​ottomata| wait
217​ottomata| yes, IF statsv data is in both DC kafka clusters
218​ bblack| it is due to the mirroring
219​ottomata| not necessarily
220​ottomata| we only mirror datacenter prefixed topics
221​ bblack| ok, so....
222​ottomata| if the topic was just 'statsv', it woudlnt' be mirrored
223​ bblack| you have two separate multi-dc problems here, at least. maybe 3
224​ottomata| haha
225​ottomata| yeah, there a bunch of ways to change this, but the fact that statsd/graphite is active/passive AND mirrored makes the kafka/statsv a little weird
226​ bblack| 1) Telling varnishkafka which kafka cluster to publish to.
227​ bblack| 2) Telling statsv which kafka cluster to publish to.
228​ bblack| 3) Telling statsv which statsd to forward to.
229​ bblack| err typo there on 2
230​ bblack| 2) Telling statsv which kafka cluster to subscribe to.
231​ottomata| 3) doesnt' matter, its discovery DNS for statsd
232​ottomata| 2) also doesn't matter, it should always be local
233​ottomata| statsv daemon in codfw should consume from kafka in codfw
234​ bblack| what happens if you take the kafka cluster in codfw offline? we lose all codfw stats? or we decide that kafka being online in codfw is a precondition for running traffic through codfw, or?
235​ottomata| good q. thinking
236​ bblack| the "should always be local" is tricky
237​ bblack| because we have N (where N is large) interdependent services, and even in an ideal world where everything's mostly active-active, some of those A/A services might be offline for maintenance in one DC while others are offline in the other.
238​ottomata| this works for mediawiki stuff, because the answer to ^^ is yes.
239​ottomata| kafka codfw must work for mediawiki to run in codfw
240​ bblack| so I tend to think every independently-failing service needs its own x-DC switches, independent of others
241​ottomata| this is kinda why i was tying it to cache text routing.
242​ bblack| when we tie them together things get tricky
243​ottomata| if cache text is offline in codfw, then no messages will be produced to codfw kafka
244​ottomata| excetp, in this case, the source data is from varnish webrequest logs.
245​ bblack| what if MW can't run in codfw right now because of <X>, and kafka can't run in eqiad right now because of <Y>, etc...
246​ bblack| youv'e tied them together on any kind of DC-level outage or maintenance.
247​ottomata| that would be like saying what if Mysql can't run in eqiad, and mediawiki can't run in codfw...
248​ bblack| ideally, even if kafka is dead in eqiad and MW is dead in codfw, MW@eqiad should still be able to log to kafka@codfw
249​ottomata| they are a little more decoupled that mysql/ mediawiki though.
250​ottomata| i dunno...
251​ottomata| that sounds dangerous to me.
252​ bblack| well, because it's not really a/a, it's mirrored with key hacks :)
253​ottomata| but ya, right now, if mw can't produce to kafka locally, then messages will be dropped
254​ottomata| bblack, that was a compromise argued with gabriel long ago....
255​ottomata| :p
256​ bblack| varnish webrequest logs are coming from an a/a source. we do sometimes depool a whole core DC out of varnish-level service, but the norm is A/A sources here.
257​ bblack| we're not going to tie that to VK availability and say "we have to shut down the caches in codfw because kafka is offline in codfw and we're missing logs"
258​ottomata| hmm, right. i see that.
259​ottomata| ok, no cache::route_table for this :)
260​ottomata| but wait
261​ottomata| no
262​ottomata| we woudln't do that.
263​ottomata| these are statsd metrics.
264​ottomata| hmm
265​ottomata| hm, yeah, but it'd be better to change a value for statsv rather than caches.
266​ottomata| is what you are saying?
267​ottomata| haha, so I shoudl make my own statsv route table :p
268​ bblack| well, the problem exists at multiple layers here, either 2 or 3 different layers depending how you look at it.
269​ bblack| this same basic problem exists for the primary webrequest kafka traffic too
270​ottomata| ya
271​ottomata| indeed
272​ bblack| right now our only answer is it's eqiad-only, and if eqiad kafka goes offline we keep serving users and lose the logging
273​ottomata| its just that i'm trying to make this single limited use of them multi DC
274​ottomata| if we were trying to do all webrequest traffic multi DC
275​ottomata| THEN we'd be having some FUN
276​ bblack| well, we should be having that kind of fun, we've been talking about multidc-for-everything for a long time now, and this is part of it :)
277​ottomata| usually the way this is done with kakfa
278​ottomata| is every DC has a local Kafka cluster
279​ottomata| and producers in that DC produce local only
280​ottomata| then in some main DC (or DCs) there is an 'aggregate' big kfka cluster
281​ottomata| that mirrors all the DC specific ones
282​ bblack| ok
283​ottomata| but that doesn't solve your problem of having to take down a whole kafka cluster.
284​ottomata| fwiw, we've never had to do that
285​ bblack| well the other option is you commit to not doing that
286​ottomata| kafka is built pretty well to be peerless and online all the time
287​ottomata| yeah
288​ bblack| I think that's an acceptable answer so long as it's real. what we've seen with some other services is a tendency to use the "other" DC for maintenance and treat multi-DC as something where one side can go offline whenever to <do things>
289​ottomata| so far that's not how kafka main (our only mluti DC kafka cluster) works
290​ bblack| (which is probably ultimately "wrong" in the long term for just about any service, but it is what it is)
291​ottomata| we do do things first in codfw, just because the risk is less
292​ottomata| but, we don't take it offline
293​ bblack| ok
294​ottomata| change-prop runs active-active in both DCs
295​ottomata| each consuming from and producing two the local kafka clusters
296​ottomata| to*
297​ bblack| so let's assume a world where the kafka cluster in a DC is considered always-online. And I guess we chose to believe that strongly enough to decouple outages (in other words, we're not taking down services because kafka's down in that DC. We're losing logs until kafka is restored).
298​ bblack| then we don't have to worry about multi-DC switching at the kafka publishing level. Local services publish to local kafka.
299​ottomata| that feels right to me, and what we've done so far
300​ bblack| in which case, for sources that are at the edge-only sites, we statically map them to the closest core, not using cache::route_table (which might remap things differently temporarily, for unrelated reasons)
301​ bblack| some hieradata says that vk-statsv in ulsfo publishes to kafka@codfw, no switching.
302​ottomata| hm, true, since we are saying they are always online.
303​ bblack| we could hypothetically switch it in hieradata, but it's not intended to ever happen as part of some operational/maintenance/outage thing
304​ bblack| but then when we do the next dc-failover trial....
305​ bblack| well I guess it still doesn't matter then. We still wouldn't edit the hieradata. We'd just turn off the publishing services on that side as appropriate (or at least, turn off their inbound requests)
306​ottomata| aue
307​ottomata| aye
308​ bblack| and then statsv-daemon's subscription should also be statically configured in hieradata to pull from the local DC kafka cluster as well
309​ottomata| ya, that's already easy
310​ottomata| and done in lots of places
311​ bblack| and then statsv->statsd uses statsd dns discovery (which if it's a/p, may go through short periods of no-response, which should be ok)
312​ottomata| yup
313​ottomata| kafka_route_table:
314​ottomata| main:
315​ottomata| eqiad: main-eqiad
316​ottomata| codfw: main-codfw
317​ottomata| esams: main-eqiad
318​ottomata| ulsfo: main-codfw
319​ottomata| ?
320​ottomata| the values are names of kafka clusters in the kafka_clusters hash
321​ bblack| that's for vk->kafka publish?
322​ bblack| you may as well go ahead and add eqsin as well, to codfw
323​ottomata| well, the only vk instance that we have that will use 'main' clusters is statsv
324​ottomata| so yes
325​ bblack| which is "main" again?
326​ottomata| 'main' is the only cluster in both eqiad and codfw
327​ottomata| the only other is 'jumbo' (which will replace 'analytics')
328​ottomata| but jumbo is only in eqiad
329​ottomata| so main is the only one where we do multi DC stuff
330​ bblack| ok
331​ottomata| i could add jumbo routes in there too, but they'd be unused
332​ottomata| jumbo:
333​ottomata| eqiad: jumbo-eqiad
334​ottomata| codfw: jumbo-eqiad
335​ottomata| esams: jumbo-eqiad
336​ottomata| ulsfo: jumbo-eqiad
337​ bblack| it might be nice just to plumb that for future-proofing, but I donno if it's worth it
338​ottomata| i'll add it in the patch with a comment and we can see what reviewers say :p
339​ottomata| oof, i coudl augment the kafka_cluster_name & kafk_clusters puppet function to use this data... then if you did
340​ottomata| kafka_config('main', 'ulsof')
341​ottomata| you'd get the config for the main-codfw clsuter
342​ bblack| yeah I guess
343​ottomata| bblack i guess it could also be higher level. instead of just cache::route_table, a static piece of data that listed the closest primary DC
344​ottomata| that was not kafka specific
345​ bblack| kafka_config($cluster, $publisher_site)
346​ottomata| yeah
347​ottomata| $::site (since kafka_config will be called on the varnishkafka host for which config is being rendered)
348​ bblack| for the cache::route_table stuff we chose to have the flexibility, though
349​ottomata| ya, that'd be left as is
350​ bblack| e.g. if we do want to take varnish@codfw offline, we don't want that to imply taking ulsfo and eqsin offline. we re-route them to use eqiad even though it's further away.
351​ bblack| basically, we're not committing to any one site's online-ness, unlike kafka
352​ottomata| aye, i'm saying just a separate static mapping that can be used as see fit, but not modified
353​ottomata| like
354​ottomata| # Maps datacenter name to the geographically closest
355​ottomata| # primary data center (either eqiad or codfw).
356​ottomata| primary_datacenter_map:
357​ottomata| eqiad: eqiad
358​ottomata| codfw: codfw
359​ottomata| esams: eqiad
360​ottomata| ulsfo: codfw
361​ottomata| eqsin: codfw
362​ bblack| yeah, maybe. it has the potential for abuse once it's out there, though.
363​ bblack| we'd need to be clear about what use-cases it's for and not-for
364​ bblack| (which is probably only kafka at this time)
365​ottomata| would you prefer i made it kafka specific?
366​ _joe_| bblack| how is traffic going to be routed from eqsin?
367​ottomata| i could also just put it in statsv profile for now, to keep grubby abusing hands off of it :)
368​ bblack| _joe_: eqsin->codfw (both physically and logically)
369​ _joe_| ok
370​ bblack| ottomata| right, my fear is someone might use it in places, thinking we'll edit that hieradata on multi-dc switches or outages, when we won't. or use it for a service that's expecting to have site-level maintenance outages or whatever.
371​ bblack| maybe put it in a generic place, but just deal with it in naming/comments
372​ottomata| i mean, i can add lots of comments saying that's not what its for
373​ottomata| ok
374​ottomata| i'll make a patch and we will see...
375​ottomata| thanks bblack.
376​ bblack| maybe call it "static_core_dc_map"
377​ottomata| aye
378​ottomata| do we use 'core' more than 'primary'?
379​ bblack| gets rid of the "primary" wording that sounds failover-y, and calls out its unchanging nature
380​ottomata| to refer to eqiad and codfw?
381​ _joe_| so your plan to make texas the center of the wiki world is finally coming to a realization. You just need active-active mediawiki :)
382​ottomata| ayy ok
383​ottomata| yeah
384​ bblack| I donno if we use "core" in this sense elsewhere. I often do informally, but not sure where formally.
385​ bblack| "git grep core" is not much help lol
386​ bblack| calls them "primary sites"
387​ bblack| which is probably wrong now that we're talking about it
388​ bblack| because people see "primary" in other related places/names/variables and will mentally imply the whole primary/secondary active/passive thing between eqiad+codfw
389​ottomata| yeah
390​ bblack| we should probably stick to nomenclature where "core" means eqiad+codfw vs non-eqiad+codfw, and the word primary is only used when talking about a/p failover scenarios.
391​ottomata| haha bblack, 'main'? man, why didn't we call these kafka clusters 'core'?!
392​ottomata| i hate the name 'main'
393​ottomata| always did, jsut didn't come up with anything better.
394​ bblack| it would be confusing now anyways :)
395​ bblack| this conversation is making me try to recall when the last time was, that we've taken all of varnish offline at a DC, other than a DC-level network outage of course.
396​ bblack| I don't think we actually do that. We do shut off frontend traffic, and we do re-route for various reasons.
397​ottomata| Hmmm, you know, i think maybe a kafka specific mapping is better..., there's no guaruntee that a new 'core' datacenter would have kafka main in it...
398​ bblack| but I don't think we've ever just shut off all varnish-be services at a DC for maint or heavy changes
399​ bblack| true
400​ottomata| right, its generally just rerouted traffic, ya? there is always some trickle of weirdo requests
401​ bblack| in the long run, we could/should expect a 3rd core DC to appear at least briefly
402​ bblack| e.g. if we were to replace eqiad with another site, we'd probably have 3 core sites during the transition period
403​ottomata| aye
404​ bblack| but I guess traffic-routing is still "different" that other services
405​ bblack| because it's support the edge sites, mostly
406​ bblack| *supporting
407​ottomata| ya
408​ottomata| are there other services at the edges?
409​ bblack| if services were running active/active, and codfw suddenly died (the whole DC) or lost too much transport to be useful, we'd still want to route around it and have ulsfo remap to eqiad, for instance
410​ bblack| as opposed to calling varnish@codfw always-up and statically-mapping ulsfo->codfw
411​ bblack| ottomata| not services like we're talking about here, no. just other meta-services for infrastructure, most of which are naturally-redundant in some other unrelated way to this multi-dc stuff.
412​ bblack| e.g. ntp, dns, etcd, etc
413​ bblack| or non-runtime like a tftp server
414​ bblack| mostly the important thing is that they don't carry any important state, so they're "easy"
415​ bblack| it's stateful things that are hard
416​ bblack| although etcd, I donno, I haven't looked at that part of things in a long time, how we're handling it
417​ bblack| static SRV entries to the closer core DCs I guess for now, but we had talked about putting etcd daemons at the edges, too
418​ottomata| aye
419​ bblack| in theory, kafka brokers at the edges could make sense at some point as well, I donno
420​ bblack| (I mean, analytics ones)
421​ bblack| there's separately the issue that we might want kafka brokers at the edge for kafka-driven PURGEs where the edge caches are subscribing
422​ottomata| aye
423​ottomata| i remember talking about that
424​ottomata| then each cache would have its own consumer group and and commit when it has purged
425​ bblack| so, rewinding a bit to the whole "use local kafka" and "kafka cluster in a core DC is always online" sorts of things...
426​ bblack| that strategy definitely does make sense if you consider that all the services pub/subbing to that kafka cluster are also local to that DC
427​ottomata| (bblack is in a mood to talk! I love it! :D)
428​ottomata| bblack, that shoudl generally be how it works, with mirror maker handling cross dc stuff
429​ bblack| if the DC dies or goes offline from the network, the services+kafka go away together.
430​ottomata| ya
431​ bblack| if we decide to route traffic away from that DC voluntarily for <some reason> it works too, traffic/load just drops off there and it all works ok.
432​ottomata| yup, or if there is a network problem, stuff will eventaully mirror.
433​ottomata| purges will eventually make it when network problem is resolved
434​ bblack| but, if we stick to that static edge->core mapping for kafka data from varnishes and this idea that the core ones never go down.
435​ bblack| it implies that even if maintenance-wise you never allow kafka@codfw to go down for kafka-reasons.... if codfw as a whole (or its network links) go down, we're faced with losing all the stats for the live traffic in ulsfo+eqsin (which we've re-routed through eqiad), or deciding to shut off all the edge sites at that half of our world to preserve stats integrity (seems like not a great idea)
436​ bblack| which argues for the idea that we should have a switch that can remap kafka data from ulsfo+eqsin->eqiad when codfw goes down.
437​ottomata| bblack, whcih sounds like a slight reason in favor of using cache::route_table (or something)
438​ottomata| ah
439​ottomata| yeah, so hopefully
440​ottomata| kafka_datacenter_map:
441​ottomata| main:
442​ottomata| # eqiad and esams should use main-eqiad.
443​ottomata| eqiad: main-eqiad
444​ottomata| esams: main-eqiad
445​ottomata| # codfw, ulsfo and eqsin should use main-codfw.
446​ottomata| codfw: main-codfw
447​ottomata| ulsfo: main-codfw
448​ottomata| eqsin: main-codfw
449​ottomata| coudl do it
450​ottomata| you could change the value of ulsfo: main-codfw to main-qiad
451​ottomata| eqiad
452​ bblack| yes, but now we're saying we'd operationally use those switches, not just "if we build more DCs or change our architecture or whatever"
453​ottomata| ?
454​ bblack| in which case we're now facing the counter-argument that if you're going to operationally use the switches, they shouldn't be in puppet
455​ottomata| haha
456​ottomata| yeah, this sounds like discovery to me
457​ottomata| but, its tricky, because kafka brokers have their own discovery for broker hostnames
458​ottomata| via zookeeper
459​ottomata| so if we did that, we should use confd + zookeeper for discovery of kafka broker list
460​ bblack| if codfw dies, or we're doing a planned simulation of codfw death, we're not shutting off ulsfo+eqsin, and we don't want to lose their stats, so we need an operator switch.
461​ bblack| in "because kafka brokers have their own discovery for broker hostnames", you mean "because kafka clients have their own discovery for broker hostnames" right?
462​ottomata| sorta, yes, in that a broker is also a client
463​ bblack| ok
464​ottomata| so
465​ottomata| for kafka
466​ottomata| brokers
467​ bblack| the case we're talking about mostly here is VK
468​ottomata| you don't tell them about the other brokers in config
469​ottomata| they use zookeeper to find each other
470​ottomata| but kafka clients are decoupled from zookeeper
471​ bblack| does VK use zookeeper to find its cluster?
472​ bblack| ok, no
473​ottomata| so you give them a list of bootstrap kafka broker names
474​ottomata| one is enough
475​ottomata| and it uses the kafka API (which in turn uses zookeeper) to find the brokers
476​ bblack| scary
477​ bblack| ok
478​ottomata| its more than just hostnames too!
479​ottomata| its actually full on kafka topic partition leadership info
480​ bblack| (scary because we have an explicit list of webrequest brokers we configure ipsec for, and someone could ignore that list in changing the zookeeper data...)
481​ottomata| so if you are consuming from a topic
482​ottomata| the client gets the list of partitions, and which brokers are leaders for those partitions
483​ottomata| and uses that to start consuming
484​ottomata| if a somethign changes (broker goes offline, leadership changes, etc.) the client is notified , and the do a new metadata request to kafka to learn what changed
485​ottomata| and re-subscribe to new leaders, etc.
486​ bblack| so ideally VK gets the names of 1-2 hosts in each DC to bootstrap from, and ZK is consistent across both and will handle failover?
487​ottomata| (re ipsec: hopefully we'll be done with that next quarter! :D)
488​ottomata| yeah, its best if you can give as many hsots as you can for bootstrapping, in case the 2 you give it are down
489​ottomata| but it doesn't use the list of bootstrap hosts to actually communicate
490​ottomata| it only uses them for initial bootstrap on startup
491​ bblack| but I guess, it's up to us to use custom tooling or confd to do manual failovers in ZK
492​ottomata| ya, if we did this, it'd be like:
493​ottomata| datacenter -> kafka cluster name mapping in confd/etcd.
494​ottomata| but
495​ottomata| when rendering config templates with confd
496​ottomata| we'd use that cluster name to find the broker names in zookeeper
497​ottomata| i'm in now
498​ottomata| in codfw:
499​ottomata| [zk: localhost:2181(CONNECTED) 5] ls /kafka/main-codfw/brokers/ids
500​ottomata| [2002, 2001, 2003]
501​ottomata| [zk: localhost:2181(CONNECTED) 7] get /kafka/main-codfw/brokers/ids/2001
502​ottomata| {"jmx_port":9999,"timestamp":"1511878095746","endpoints":["PLAINTEXT://kafka2001.codfw.wmnet:9092"],"host":"kafka2001.codfw.wmnet","version":2,"port":9092}
503​ bblack| should we do the static (non-zk) config with a list of broker hosts from both DCs though, instead of something switching in confd/etcd?
504​ bblack| and then deal with failover exclusively in the zk world?
505​ottomata| could do, i suppose it woulnd' hurt, except we'd have to maintain it
506​ottomata| and the data i already in zookeeper
507​ bblack| well yeah but data inside of zookeeper does no good for the hostlist we need to reach the service that has the zookeeper data
508​ottomata| then again, i'm maintaining that list now in hiera anyway
509​ottomata| so moving it from hiera -> confd/etcd isn't that big of a deal i guess
510​ottomata| oh
511​ottomata| ya
512​ottomata| we'd need to discover zookeeper
513​ bblack| clearly, we just need to mirror all the things in a giant circle of hieradata->etcd->zookeeper->automatic-hieradata-commits, and use kafka topics for all the ->
514​ottomata| hah, which is re-coupling kafka clients with zookeeper
515​ottomata| hahaha yeah
516​ bblack| but seriously, manual changes to a discovery list in puppet/hieradata shouldn't be a big deal, because it doesn't operationally matter much if they fall out of sync a bit anyways.
517​ bblack| it's not a realtime switch-commit. you're just updating the discovery hints.
518​ottomata| yeah, and it rarely changes
519​ bblack| some of this stuff really deserves some off-site design discussion time. not really so much the kafka questions in particular, but solidfying our standards and future plans about all the related meta-topics this touches on for multi-dc work and etcd and zookeeper and so-on.
520​ottomata| aye
521​ottomata| bblack not sure if you've seen this, but I have some other kafka plans in mind too:
522​ottomata| this is more app level stuff, but relevant probably to cache purges
523​ottomata| working on making that a next FY program
524​ottomata| (slowly :) )
Ottomata created this paste.Dec 14 2017, 3:25 PM
Ottomata edited the content of this paste. (Show Details)Dec 14 2017, 3:43 PM
Ottomata changed the title of this paste from statsv multi DC discussion to statsv multi DC discussion - 2017-12-12.