Page MenuHomePhabricator

Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main
Closed, ResolvedPublic13 Estimated Story Points

Description

eventgate-main will replace eventlogging-service-eventbus (currently co-located on Kafka main brokers).

To do this, we need to :

  • Refactor the eventgate-analytics chart to be a more general 'eventgate' chart.
  • Deploy eventgate-analytics service using the new eventgate chart (we will use a new k8s service port and point existing LVS to it) - T222962
  • Deploy eventgate-main service using new eventgate chart - T222899

Related Objects

Event Timeline

Nuria created this task.
Ottomata renamed this task from Modern Event Platform: Deploy instance of EventGate service that processes events from kafka main to Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main .Mar 14 2019, 7:17 PM

Change 506166 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Add eventgate-main chart

https://gerrit.wikimedia.org/r/506166

@fsero @akosiaris - moving discussion about eventgate-main patch here, will be easier to communicate.

I'm going to try to make a single chart that can be used for all deployments of EventGate, and I think I need a little help figuring out how to organize that. Right now in prod we have:

  • chart: eventgate-analytics
  • releases:
    • production (in CLUSTER=eqiad and CLUSTER=codfw)
    • staging (in CLUSTER=staging).

I think what we want is:

  • chart: eventgate
  • releases:
    • analytics-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • analytics-staging (in CLUSTER=staging)
    • main-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • main-staging (in CLUSTER=staging)

And also eventually as part of T217142 something like:

  • releases:
    • logging-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • logging-staging (in CLUSTER=staging)

Who knows there might also be:

  • releases:
    • analytics-public-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • analytics-public-staging (in CLUSTER=staging)

But that's yet TBD.

I think to do this I'm going to have to templatize almost all of service runner config.yaml into values.yaml files. Is the values.yaml file in the released chart used at all when releasing in production? That does using e.g. -f eventgate-analytics-eqiad-values.yaml merge the two, or is only the -f one used?

With so many releases, I'd like to start keeping the release specific values.yaml files somewhere other than the harddisk of deploy1001. Can/should I include them in this chart too? Something like charts/eventgate/releases/<release_name>.values.yaml?

Hm, will k8s namespaces help us here?

...lots of questions!

Is the values.yaml file in the released chart used at all when releasing in production? That does using e.g. -f eventgate-analytics-eqiad-values.yaml merge the two, or is only the -f one used?

Found the docs on this, the answer is yes they are merged.

Maybe:

  • chart: eventgate
    • namespaces: analytics, main, logging
      • releases: production, staging

And I could set the service name to e.g. $chart-$namespace. I think would keep metric names etc. the same.

Hm, the wmf.releasename template only uses $chart-$release. Perhaps I could change it to $chart-$namespace-$release?

Just brainstorming here, not sure what is best to do...

@fsero @akosiaris - moving discussion about eventgate-main patch here, will be easier to communicate.

I'm going to try to make a single chart that can be used for all deployments of EventGate, and I think I need a little help figuring out how to organize that. Right now in prod we have:

  • chart: eventgate-analytics
  • releases:
    • production (in CLUSTER=eqiad and CLUSTER=codfw)
    • staging (in CLUSTER=staging).

I think what we want is:

  • chart: eventgate
  • releases:
    • analytics-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • analytics-staging (in CLUSTER=staging)
    • main-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • main-staging (in CLUSTER=staging)

And also eventually as part of T217142 something like:

  • releases:
    • logging-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • logging-staging (in CLUSTER=staging)

Sounds good to me.

Who knows there might also be:

  • releases:
    • analytics-public-production (in CLUSTER=eqiad and CLUSTER=codfw)
    • analytics-public-staging (in CLUSTER=staging)

I have no idea what that would be, but I guess I 'll find out.

But that's yet TBD.

I think to do this I'm going to have to templatize almost all of service runner config.yaml into values.yaml files. Is the values.yaml file in the released chart used at all when releasing in production? That does using e.g. -f eventgate-analytics-eqiad-values.yaml merge the two, or is only the -f one used?

You are already answered it, but yes, they are merged.

With so many releases, I'd like to start keeping the release specific values.yaml files somewhere other than the harddisk of deploy1001. Can/should I include them in this chart too? Something like charts/eventgate/releases/<release_name>.values.yaml?

No you should not. The chart should not have wmf installation specific stuff in it. Those values will go in the the helmfile git repo soon (aside from the non-public parts of it which will be populated via puppet from the private repo on deploy1001) which will be publicly available on gerrit and thus mirrored everywhere.

Hm, will k8s namespaces help us here?

With regard to what? Keep in mind that we follow a strict rule on the organization of namespaces and it's 1 namespace per service (we are trying to counter Conway's law in a way, time will tell if we have been successful). Whether that service is powered by the same software + helm chart as some other service is irrelevant.

...lots of questions!

Maybe:

  • chart: eventgate
    • namespaces: analytics, main, logging
      • releases: production, staging

More like

  • chart: eventgate
  • namespaces: eventgate-analytics, eventgate-main, eventgate-logging
    • * releases: production, staging, <canary possibly>

Note how the chart itself is in the above decoupled from the namespaces AND the releases. Also there is nothing really binding us to release names like production, staging. Unlike the namespaces (which denote the service itself and SRE serviceops sets them), the release names are up to the deployer in reality. We 've established a status quo to have some level of conformity, but if a release named eventgate-analytics-production makes more sense to you, it's probably the way to go. That being said, it's probably not prudent to name releases with random strings.

And I could set the service name to e.g. $chart-$namespace. I think would keep metric names etc. the same.

Hm, the wmf.releasename template only uses $chart-$release. Perhaps I could change it to $chart-$namespace-$release?

I 'd advise against having the namespace referenced in charts. We used to have that, turned out to be a pretty big hurdle with helm and chart testing. That being said, off the top of my head, I can't really see a big issue with the change you are proposing.

But you can probably just do the trick I mentioned above and name your releases uniquely across all of the namespaces using the chart. I 'll grant you it's not the safeguard though.

Just brainstorming here, not sure what is best to do...

I 'd advise against having the namespace referenced in charts.

I'm mostly just trying fix the fact that the service and the metrics are named after .Chart.Name currently. We don't currently use the release (e.g. staging, production) in these names, which is good since they can vary even more than that, as you say. How should I set

# Statsd metrics reporter
metrics:
  name: {{ .Chart.Name }}

If I don't use the namespace?

Hm yeah it gets a little weird. Without using namespaces, wmf.releasename will be eventgate-production for each of the mentioned production namespaces, since it is defined as .Chart.Name + .Release.Name

Hm, I think what I need is a values.yaml specified service name. Going to add this to _helpers.tpl:

{{- define "wmf.servicename" -}}
{{- .Values.service.name | default .Chart.Name | trunc 63 | trimSuffix "-" -}}
{{- end -}}

And use it in the various templates in place of wmf.chartname

This way I don't have to mess with the namespaces at all. I think this will work more easily with scap-helm as we have it, where the namespace matches the 'service name' when we do scap-helm <service-name> ....

Hm, if ^ works and it became a convention, we could make scap-helm automatically add --set service.name=$SERVICE to helm commands.

I 'd advise against having the namespace referenced in charts.

I'm mostly just trying fix the fact that the service and the metrics are named after .Chart.Name currently. We don't currently use the release (e.g. staging, production) in these names, which is good since they can vary even more than that, as you say. How should I set

# Statsd metrics reporter
metrics:
  name: {{ .Chart.Name }}

If I don't use the namespace?

Metrics are tagged with a label containing the namespace. e.g. taken from production as is

service_runner_request_duration_seconds_count{app="eventgate-analytics",instance="10.64.64.102:9102",job="k8s-pods",kubernetes_namespace="eventgate-analytics",kubernetes_pod_name="eventgate-analytics-production-79cb8b7695-ln88r",method="GET",pod_template_hash="3576463251",release="production",service="eventgate-analytics",status="200",uri="-info"}

service_runner_request_duration_seconds_count{app="eventgate-analytics",instance="10.64.64.102:9102",job="k8s-pods",kubernetes_namespace="eventgate-analytics",kubernetes_pod_name="eventgate-analytics-production-79cb8b7695-ln88r",method="GET",pod_template_hash="3576463251",release="production",service="eventgate-analytics",status="200",uri="root"}

service_runner_request_duration_seconds_count{app="eventgate-analytics",instance="10.64.64.102:9102",job="k8s-pods",kubernetes_namespace="eventgate-analytics",kubernetes_pod_name="eventgate-analytics-production-79cb8b7695-ln88r",method="GET",pod_template_hash="3576463251",release="production",service="eventgate-analytics",status="2xx",uri="-info"}

As you can see kubernetes_namespace is right there. So the metrics have the information already.

Hm, I think what I need is a values.yaml specified service name. Going to add this to _helpers.tpl:

{{- define "wmf.servicename" -}}
{{- .Values.service.name | default .Chart.Name | trunc 63 | trimSuffix "-" -}}
{{- end -}}

And use it in the various templates in place of wmf.chartname

This way I don't have to mess with the namespaces at all. I think this will work more easily with scap-helm as we have it, where the namespace matches the 'service name' when we do scap-helm <service-name> ....

If kubernetes_namespace is sufficient, I 'd rather _helpers.tpl did not deviate from the other charts.

kubernetes_namespace might be right, but will app or service be? If we leave it as is, I believe those will be set to .Chart.Name, which will now be just 'eventgate'.

Hm,but since namespace is set from scap-help <service>, then ya, I can use .Release.Namespace instead of .Chart.Name without modifying _helper.tpl. That is, as long as you are ok with using the namespace in the chart templates! :)

kubernetes_namespace might be right, but will app or service be? If we leave it as is, I believe those will be set to .Chart.Name, which will now be just 'eventgate'.

Yes, that's true. Altough Chart.Name is the default value. You can override it at helm install time via --set chartName (at least as far as app goes). For service you probably should use wmf.chartname for the same behavior. Alternatively you can use wmf.releasename, which has a default of .Chart.Name-.Release.Name (with .Chart.Name again overrideable via --set chartName

Hm,but since namespace is set from scap-help <service>, then ya, I can use .Release.Namespace instead of .Chart.Name without modifying _helper.tpl. That is, as long as you are ok with using the namespace in the chart templates! :)

The problems we met was with setting the namespace in kubernetes resources. That should be avoided for sure. But for using it to populate a config it's probably fine to use .Release.Namespace.

Alternatively you can use wmf.releasename, which has a default of .Chart.Name-.Release.Name

I'd rather not use .Release.Name, as that will set e.g. service=eventgate-analytics-production, rather than just service=eventgate-analytics

You can override it at helm install time via --set chartName

This also seems strange, as I'm not actually changing the chartName, right? It seems weird to be able to override the chart name at all...the chart is named 'eventgate' according to Helm, it seems weird to change it in k8s templates. Perhaps the naming of wmf.chartname isn't quite accurate if it meant to be overridden? would wmf.servicename make more sense :)

I think using .Release.Namespace will work for me, but that gets a little weird in minikube. I've got the chart deployed in the default namespace. If I just do a regular helm install without setting e.g. --namespace=eventgate-dev, the the service and pod names ends up being e.g. 'default-solumn-wolf' (Release.Namespace-Release.Name). If I do set --namespace=eventgate-dev, I think EventGate can no longer talk to the Kafka pods in the default namespace. (Not sure about that, but my pod in eventgate-dev is stuck in ContainerCreating...still troubleshooting.)

Anyway, are you sure we shouldn't have a separate concept of service.name/wmf.servicename? I agree that it should be set the same as the k8s namespace in prod, but relying on that elsewhere is getting weird. :)

If I do set --namespace=eventgate-dev, I think EventGate can no longer talk to the Kafka pods in the default namespace.

Ignore this, I just had some templating stuff wrong.

So, using .Release.Namespace for service.name in minikube works just fine, it just adds the extra requirement that all helm/kubectl commands need to add --namespace with minikube, since we don't have scap-helm doing it for us.

Hm, is it possible to use helm template to check rendered templates if we .Release.Namespace? AFAICT helm template doesn't take a --namespace argument, and all uses of .Release.Namespace are being set to 'default'.

Hm, I think I do have to change _helpers.tpl. wmf.releasename is defined to be .Chart.Name-.Release.Name, which will be e.g. eventgate-production. We want this to be e.g. eventgate-analytics-production. So either I change all usages of wmf.releasename to .Release.Namespace-.Release.Name, or I modify wmf.releasename.

OR we make a new wmf.servicename and use that instead of trying to override .Chart.Name via .Values.chartName. ?

as @akosiaris said i would prefer to not include {{.Release.namespace}} because there are some issues regarding how Helm interprets namespacing [1] and it seems kind of confusing to me to include the namespace (you can end with eventgate-default on minikube for instance)

AFAICS if you want:

chart: eventgate
releases:
- analytics-production (in CLUSTER=eqiad and CLUSTER=codfw)
- analytics-staging (in CLUSTER=staging)
- main-production (in CLUSTER=eqiad and CLUSTER=codfw)
- main-staging (in CLUSTER=staging)

means that the application description is the same (eventgate) and this should be denoted using the chartName (eventgate), every service should live in its own namespace (eventgate-main and eventgate-analytics) and the release should be named as main-production or analytics-staging and change the templates to use wmf.releasename which will expand to {{ chartName + release.name }}

regarding values even if helm supports merging values, values should be kept over helmfile repo which would be clearer to read and modify:

chart: eventgate
namespace: eventgate-main
values:

A: X
B: C

I don't see wmf.servicename needed as you can achieve the same changing the release name.

(p.s: for instace the first scap-helm deploy "

CLUSTER=staging scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics

could have been changed to

CLUSTER=staging scap-helm eventgate-analytics install -n analytics-staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics

[1] https://github.com/helm/helm/issues/2060

Hm, I think I mostly get what you are proposing, except I'm not sure how I can avoid using .Release.Namespace in the templates. I need something that evalutates to exactly e.g. 'eventgate-analytics'. With scap-helm eventgate-analytics ..., that is .Release.Namespace, right?

Talked with Fabian in IRC. The idea of using wmf.releasename (which is Chart.Name + Release.Name) instead of Release.Namespace will work, if we set e.g. release --name to 'analytics' or 'main' when we deploy. So

CLUSTER=staging scap-helm eventgate-analytics install -n analytics -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics

Patchset 3 in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/506166 should do this.

Change 506166 merged by Ottomata:
[operations/deployment-charts@master] Refactor eventgate-analytics to eventgate

https://gerrit.wikimedia.org/r/506166

Change 507409 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - include error_stream in default values

https://gerrit.wikimedia.org/r/507409

Change 507409 merged by Ottomata:
[operations/deployment-charts@master] eventgate - include error_stream in default values

https://gerrit.wikimedia.org/r/507409

Change 507411 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - properly indent stream_config

https://gerrit.wikimedia.org/r/507411

Change 507411 merged by Ottomata:
[operations/deployment-charts@master] eventgate - properly indent stream_config

https://gerrit.wikimedia.org/r/507411

Change 508324 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - fix duplicate config error_stream in config.yaml template

https://gerrit.wikimedia.org/r/508324

Change 508324 merged by Ottomata:
[operations/deployment-charts@master] eventgate - fix duplicate config error_stream in config.yaml template

https://gerrit.wikimedia.org/r/508324

Change 508371 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add eventgate-main to profile::kubernetes::deployment_server::services

https://gerrit.wikimedia.org/r/508371

I don't think so. This part is about a specific new release in an existing namespace. Examples would be adding a canary release coexisting with the production release.

The patch above + a number of others is required to create and deploy to a new namespace (which is something that SREs should do as it probably also mean a new LVS service, new calico rules etc).

Note btw that we have a policy of 1 service per namespace. That's in order to avoid deployers stepping on each others toes and having to do weird dances when services change ownership.

Ok, sounds good. I'll be deploying a new 'service' to the eventgate-analytics namespace while we deprecate the old one, but that should just be temporary.

This task is about deploying a new eventgate-main service. I've refactored the eventgate-analytics chart to just 'eventgate', and will use the release name 'main'.

which is something that SREs should do as it probably also mean a new LVS service, new calico rules etc

Ok! Docs would still be good tho ya? I'd be happy to make patches and have yall review/merge.
LVS can come after we get eventgate-main up. Are the calico rules in puppet or elsewhere?

OR y'all can do it if you have the time. :)

Change 508582 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Change eventgate-analytics LVS port to 33192

https://gerrit.wikimedia.org/r/508582

Ok, working on two remaining tasks here:

  1. Replace chart=eventgate-analytics release=production with chart=eventgate release=analytics. I've deployed this new service. It is listening on port 33192. I believe all we need to do now is change LVS: https://gerrit.wikimedia.org/r/c/operations/puppet/+/508582
  1. Spin up eventgate-main. This has a few steps before we can deploy in k8s, but most of it should be in https://gerrit.wikimedia.org/r/c/operations/puppet/+/508371. Once we deploy eventgate-main to k8s, we can set up LVS.

@akosiaris @fsero -- Can you help me review/merge these this week?

Change 508371 merged by Fsero:
[operations/puppet@production] Configure eventgate-main for k8s deployment

https://gerrit.wikimedia.org/r/508371

Mentioned in SAL (#wikimedia-operations) [2019-05-07T16:05:32Z] <fsero> created eventgate-main tokens - T218346

@Ottomata i've created the namespaces and the users for eventgate-main you should be able to deploy the service on k8s without configuring LVS that would be done after

Awesome thank you! I'll get that set up today, and we can talk about LVS stuff tomorrow.

Change 509141 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] service::docker - add $image_name parameter

https://gerrit.wikimedia.org/r/509141

Change 509141 merged by Ottomata:
[operations/puppet@production] service::docker - add $image_name parameter

https://gerrit.wikimedia.org/r/509141

Change 509442 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Keep using $title for service config dir name, not $image_name

https://gerrit.wikimedia.org/r/509442

Change 509442 merged by Ottomata:
[operations/puppet@production] Keep using $title for service config dir name, not $image_name

https://gerrit.wikimedia.org/r/509442

Change 509449 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] beta - Use eventgate services on new deployment-eventgate-1 instance

https://gerrit.wikimedia.org/r/509449

Change 509449 merged by Ottomata:
[operations/mediawiki-config@master] beta - Configure eventgate-* services with new deployment-eventgate-1 node

https://gerrit.wikimedia.org/r/509449

Change 508582 abandoned by Ottomata:
Change eventgate-analytics LVS port to 33192

Reason:
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/ /509866 instead

https://gerrit.wikimedia.org/r/508582