eventgate-main will replace eventlogging-service-eventbus (currently co-located on Kafka main brokers).
To do this, we need to :
eventgate-main will replace eventlogging-service-eventbus (currently co-located on Kafka main brokers).
To do this, we need to :
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ottomata | T185233 Modern Event Platform | |||
Resolved | Ottomata | T201068 Modern Event Platform: Stream Intake Service | |||
Resolved | Ottomata | T211248 Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main | |||
Resolved | Ottomata | T218346 Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main | |||
Resolved | akosiaris | T222899 Set up LVS for eventgate-main on port 32192 | |||
Resolved | Ottomata | T222962 Use new eventgate chart release analytics for eventgate-analytics service. |
Change 506166 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Add eventgate-main chart
@fsero @akosiaris - moving discussion about eventgate-main patch here, will be easier to communicate.
I'm going to try to make a single chart that can be used for all deployments of EventGate, and I think I need a little help figuring out how to organize that. Right now in prod we have:
I think what we want is:
And also eventually as part of T217142 something like:
Who knows there might also be:
But that's yet TBD.
I think to do this I'm going to have to templatize almost all of service runner config.yaml into values.yaml files. Is the values.yaml file in the released chart used at all when releasing in production? That does using e.g. -f eventgate-analytics-eqiad-values.yaml merge the two, or is only the -f one used?
With so many releases, I'd like to start keeping the release specific values.yaml files somewhere other than the harddisk of deploy1001. Can/should I include them in this chart too? Something like charts/eventgate/releases/<release_name>.values.yaml?
Hm, will k8s namespaces help us here?
...lots of questions!
Is the values.yaml file in the released chart used at all when releasing in production? That does using e.g. -f eventgate-analytics-eqiad-values.yaml merge the two, or is only the -f one used?
Found the docs on this, the answer is yes they are merged.
Maybe:
And I could set the service name to e.g. $chart-$namespace. I think would keep metric names etc. the same.
Hm, the wmf.releasename template only uses $chart-$release. Perhaps I could change it to $chart-$namespace-$release?
Just brainstorming here, not sure what is best to do...
Sounds good to me.
Who knows there might also be:
- releases:
- analytics-public-production (in CLUSTER=eqiad and CLUSTER=codfw)
- analytics-public-staging (in CLUSTER=staging)
I have no idea what that would be, but I guess I 'll find out.
But that's yet TBD.
I think to do this I'm going to have to templatize almost all of service runner config.yaml into values.yaml files. Is the values.yaml file in the released chart used at all when releasing in production? That does using e.g. -f eventgate-analytics-eqiad-values.yaml merge the two, or is only the -f one used?
You are already answered it, but yes, they are merged.
With so many releases, I'd like to start keeping the release specific values.yaml files somewhere other than the harddisk of deploy1001. Can/should I include them in this chart too? Something like charts/eventgate/releases/<release_name>.values.yaml?
No you should not. The chart should not have wmf installation specific stuff in it. Those values will go in the the helmfile git repo soon (aside from the non-public parts of it which will be populated via puppet from the private repo on deploy1001) which will be publicly available on gerrit and thus mirrored everywhere.
Hm, will k8s namespaces help us here?
With regard to what? Keep in mind that we follow a strict rule on the organization of namespaces and it's 1 namespace per service (we are trying to counter Conway's law in a way, time will tell if we have been successful). Whether that service is powered by the same software + helm chart as some other service is irrelevant.
...lots of questions!
More like
Note how the chart itself is in the above decoupled from the namespaces AND the releases. Also there is nothing really binding us to release names like production, staging. Unlike the namespaces (which denote the service itself and SRE serviceops sets them), the release names are up to the deployer in reality. We 've established a status quo to have some level of conformity, but if a release named eventgate-analytics-production makes more sense to you, it's probably the way to go. That being said, it's probably not prudent to name releases with random strings.
And I could set the service name to e.g. $chart-$namespace. I think would keep metric names etc. the same.
Hm, the wmf.releasename template only uses $chart-$release. Perhaps I could change it to $chart-$namespace-$release?
I 'd advise against having the namespace referenced in charts. We used to have that, turned out to be a pretty big hurdle with helm and chart testing. That being said, off the top of my head, I can't really see a big issue with the change you are proposing.
But you can probably just do the trick I mentioned above and name your releases uniquely across all of the namespaces using the chart. I 'll grant you it's not the safeguard though.
Just brainstorming here, not sure what is best to do...
I 'd advise against having the namespace referenced in charts.
I'm mostly just trying fix the fact that the service and the metrics are named after .Chart.Name currently. We don't currently use the release (e.g. staging, production) in these names, which is good since they can vary even more than that, as you say. How should I set
# Statsd metrics reporter metrics: name: {{ .Chart.Name }}
If I don't use the namespace?
Hm yeah it gets a little weird. Without using namespaces, wmf.releasename will be eventgate-production for each of the mentioned production namespaces, since it is defined as .Chart.Name + .Release.Name
Hm, I think what I need is a values.yaml specified service name. Going to add this to _helpers.tpl:
{{- define "wmf.servicename" -}}
{{- .Values.service.name | default .Chart.Name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
And use it in the various templates in place of wmf.chartname
This way I don't have to mess with the namespaces at all. I think this will work more easily with scap-helm as we have it, where the namespace matches the 'service name' when we do scap-helm <service-name> ....
Hm, if ^ works and it became a convention, we could make scap-helm automatically add --set service.name=$SERVICE to helm commands.
Metrics are tagged with a label containing the namespace. e.g. taken from production as is
service_runner_request_duration_seconds_count{app="eventgate-analytics",instance="10.64.64.102:9102",job="k8s-pods",kubernetes_namespace="eventgate-analytics",kubernetes_pod_name="eventgate-analytics-production-79cb8b7695-ln88r",method="GET",pod_template_hash="3576463251",release="production",service="eventgate-analytics",status="200",uri="-info"} service_runner_request_duration_seconds_count{app="eventgate-analytics",instance="10.64.64.102:9102",job="k8s-pods",kubernetes_namespace="eventgate-analytics",kubernetes_pod_name="eventgate-analytics-production-79cb8b7695-ln88r",method="GET",pod_template_hash="3576463251",release="production",service="eventgate-analytics",status="200",uri="root"} service_runner_request_duration_seconds_count{app="eventgate-analytics",instance="10.64.64.102:9102",job="k8s-pods",kubernetes_namespace="eventgate-analytics",kubernetes_pod_name="eventgate-analytics-production-79cb8b7695-ln88r",method="GET",pod_template_hash="3576463251",release="production",service="eventgate-analytics",status="2xx",uri="-info"}
As you can see kubernetes_namespace is right there. So the metrics have the information already.
If kubernetes_namespace is sufficient, I 'd rather _helpers.tpl did not deviate from the other charts.
kubernetes_namespace might be right, but will app or service be? If we leave it as is, I believe those will be set to .Chart.Name, which will now be just 'eventgate'.
Hm,but since namespace is set from scap-help <service>, then ya, I can use .Release.Namespace instead of .Chart.Name without modifying _helper.tpl. That is, as long as you are ok with using the namespace in the chart templates! :)
Yes, that's true. Altough Chart.Name is the default value. You can override it at helm install time via --set chartName (at least as far as app goes). For service you probably should use wmf.chartname for the same behavior. Alternatively you can use wmf.releasename, which has a default of .Chart.Name-.Release.Name (with .Chart.Name again overrideable via --set chartName
Hm,but since namespace is set from scap-help <service>, then ya, I can use .Release.Namespace instead of .Chart.Name without modifying _helper.tpl. That is, as long as you are ok with using the namespace in the chart templates! :)
The problems we met was with setting the namespace in kubernetes resources. That should be avoided for sure. But for using it to populate a config it's probably fine to use .Release.Namespace.
Alternatively you can use wmf.releasename, which has a default of .Chart.Name-.Release.Name
I'd rather not use .Release.Name, as that will set e.g. service=eventgate-analytics-production, rather than just service=eventgate-analytics
You can override it at helm install time via --set chartName
This also seems strange, as I'm not actually changing the chartName, right? It seems weird to be able to override the chart name at all...the chart is named 'eventgate' according to Helm, it seems weird to change it in k8s templates. Perhaps the naming of wmf.chartname isn't quite accurate if it meant to be overridden? would wmf.servicename make more sense :)
I think using .Release.Namespace will work for me, but that gets a little weird in minikube. I've got the chart deployed in the default namespace. If I just do a regular helm install without setting e.g. --namespace=eventgate-dev, the the service and pod names ends up being e.g. 'default-solumn-wolf' (Release.Namespace-Release.Name). If I do set --namespace=eventgate-dev, I think EventGate can no longer talk to the Kafka pods in the default namespace. (Not sure about that, but my pod in eventgate-dev is stuck in ContainerCreating...still troubleshooting.)
Anyway, are you sure we shouldn't have a separate concept of service.name/wmf.servicename? I agree that it should be set the same as the k8s namespace in prod, but relying on that elsewhere is getting weird. :)
If I do set --namespace=eventgate-dev, I think EventGate can no longer talk to the Kafka pods in the default namespace.
Ignore this, I just had some templating stuff wrong.
So, using .Release.Namespace for service.name in minikube works just fine, it just adds the extra requirement that all helm/kubectl commands need to add --namespace with minikube, since we don't have scap-helm doing it for us.
Hm, is it possible to use helm template to check rendered templates if we .Release.Namespace? AFAICT helm template doesn't take a --namespace argument, and all uses of .Release.Namespace are being set to 'default'.
Hm, I think I do have to change _helpers.tpl. wmf.releasename is defined to be .Chart.Name-.Release.Name, which will be e.g. eventgate-production. We want this to be e.g. eventgate-analytics-production. So either I change all usages of wmf.releasename to .Release.Namespace-.Release.Name, or I modify wmf.releasename.
OR we make a new wmf.servicename and use that instead of trying to override .Chart.Name via .Values.chartName. ?
as @akosiaris said i would prefer to not include {{.Release.namespace}} because there are some issues regarding how Helm interprets namespacing [1] and it seems kind of confusing to me to include the namespace (you can end with eventgate-default on minikube for instance)
AFAICS if you want:
chart: eventgate releases: - analytics-production (in CLUSTER=eqiad and CLUSTER=codfw) - analytics-staging (in CLUSTER=staging) - main-production (in CLUSTER=eqiad and CLUSTER=codfw) - main-staging (in CLUSTER=staging)
means that the application description is the same (eventgate) and this should be denoted using the chartName (eventgate), every service should live in its own namespace (eventgate-main and eventgate-analytics) and the release should be named as main-production or analytics-staging and change the templates to use wmf.releasename which will expand to {{ chartName + release.name }}
regarding values even if helm supports merging values, values should be kept over helmfile repo which would be clearer to read and modify:
chart: eventgate
namespace: eventgate-main
values:
A: X B: C
I don't see wmf.servicename needed as you can achieve the same changing the release name.
(p.s: for instace the first scap-helm deploy "
CLUSTER=staging scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics
could have been changed to
CLUSTER=staging scap-helm eventgate-analytics install -n analytics-staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics
Hm, I think I mostly get what you are proposing, except I'm not sure how I can avoid using .Release.Namespace in the templates. I need something that evalutates to exactly e.g. 'eventgate-analytics'. With scap-helm eventgate-analytics ..., that is .Release.Namespace, right?
Talked with Fabian in IRC. The idea of using wmf.releasename (which is Chart.Name + Release.Name) instead of Release.Namespace will work, if we set e.g. release --name to 'analytics' or 'main' when we deploy. So
CLUSTER=staging scap-helm eventgate-analytics install -n analytics -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics
Patchset 3 in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/506166 should do this.
Change 506166 merged by Ottomata:
[operations/deployment-charts@master] Refactor eventgate-analytics to eventgate
Change 507409 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - include error_stream in default values
Change 507409 merged by Ottomata:
[operations/deployment-charts@master] eventgate - include error_stream in default values
Change 507411 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - properly indent stream_config
Change 507411 merged by Ottomata:
[operations/deployment-charts@master] eventgate - properly indent stream_config
Change 508324 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - fix duplicate config error_stream in config.yaml template
Change 508324 merged by Ottomata:
[operations/deployment-charts@master] eventgate - fix duplicate config error_stream in config.yaml template
Change 508371 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add eventgate-main to profile::kubernetes::deployment_server::services
@akosiaris I think https://wikitech.wikimedia.org/wiki/Kubernetes/Helm#Install_a_new_release_for_a_specific_service needs an update to include https://gerrit.wikimedia.org/r/c/operations/puppet/+/508371 as a step.
I don't think so. This part is about a specific new release in an existing namespace. Examples would be adding a canary release coexisting with the production release.
The patch above + a number of others is required to create and deploy to a new namespace (which is something that SREs should do as it probably also mean a new LVS service, new calico rules etc).
Note btw that we have a policy of 1 service per namespace. That's in order to avoid deployers stepping on each others toes and having to do weird dances when services change ownership.
Ok, sounds good. I'll be deploying a new 'service' to the eventgate-analytics namespace while we deprecate the old one, but that should just be temporary.
This task is about deploying a new eventgate-main service. I've refactored the eventgate-analytics chart to just 'eventgate', and will use the release name 'main'.
which is something that SREs should do as it probably also mean a new LVS service, new calico rules etc
Ok! Docs would still be good tho ya? I'd be happy to make patches and have yall review/merge.
LVS can come after we get eventgate-main up. Are the calico rules in puppet or elsewhere?
OR y'all can do it if you have the time. :)
Change 508582 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Change eventgate-analytics LVS port to 33192
Ok, working on two remaining tasks here:
@akosiaris @fsero -- Can you help me review/merge these this week?
Change 508371 merged by Fsero:
[operations/puppet@production] Configure eventgate-main for k8s deployment
Mentioned in SAL (#wikimedia-operations) [2019-05-07T16:05:32Z] <fsero> created eventgate-main tokens - T218346
@Ottomata i've created the namespaces and the users for eventgate-main you should be able to deploy the service on k8s without configuring LVS that would be done after
Awesome thank you! I'll get that set up today, and we can talk about LVS stuff tomorrow.
Change 509141 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] service::docker - add $image_name parameter
Change 509141 merged by Ottomata:
[operations/puppet@production] service::docker - add $image_name parameter
Change 509442 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Keep using $title for service config dir name, not $image_name
Change 509442 merged by Ottomata:
[operations/puppet@production] Keep using $title for service config dir name, not $image_name
Change 509449 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] beta - Use eventgate services on new deployment-eventgate-1 instance
Change 509449 merged by Ottomata:
[operations/mediawiki-config@master] beta - Configure eventgate-* services with new deployment-eventgate-1 node
Change 508582 abandoned by Ottomata:
Change eventgate-analytics LVS port to 33192
Reason:
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/ /509866 instead