Page MenuHomePhabricator

Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline
Closed, ResolvedPublic8 Story Points

Description

This ticket tracks the work to be done to set up a Jenkins CI + Docker + Kubernetes deployment pipeline for the Stream Intake Service recently named EventGate.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.Jan 7 2019, 8:28 PM
Ottomata moved this task from Next Up to In Progress on the EventBus board.
Ottomata set the point value for this task to 8.Jan 8 2019, 4:17 PM

Change 482855 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[integration/config@master] Add eventgate-ci for deployment pipeline

https://gerrit.wikimedia.org/r/482855

@Pchelolo, hmmmm. eventgate in prod will need to have the event-schemas repo(s) available somehow. I'm working on getting the docker images and helm charts figured out. For the initial deployment prototype, I'm considering just making a blubber and CI based docker image that will be included in the eventgate docker image somewhere. This will work for a trial, but will be a bit inflexible, since it will mean that a new schema will require a rebuild and redeploy of eventgate.

Perhaps we should consider deploying a quick and easy (non kubernetes based) schema registry? We could set up something simple via puppet: a webserver and a git clone ensure => latest of the event-schemas repo. Thoughts?

Change 482867 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] WIP Serve event-schemas repo via http

https://gerrit.wikimedia.org/r/482867

We could consider using https://kubernetes.io/docs/concepts/storage/volumes/ for that if it's easy to set up in our k8s, @akosiaris will know more.

Regarding a little http server - if it's gonna only be used for a test deployment - I'm ok with that, but it will be a blocker for putting any production load on the deployed eventgate. We can not have a production service depend on something like this.

Change 482855 merged by jenkins-bot:
[integration/config@master] Add eventgate-ci for deployment pipeline

https://gerrit.wikimedia.org/r/482855

@akosiaris In your doc , in the Start Up section, you mention 'Open grafana'. Can you elaborate here? :)

OH! Nevermind I see, that isn't an instruction...but a summary of what we are doing, never mind!

Ok, I'm pretty close. I've got the charts deployed in minikube via helm. It seems my setup isn't quite right though, I think the image doesn't start properly. Got any tips for debugging? Not finding much help via helm / kubectl commands. (Tried both kubectl logs and attach)

I think kubectl describe pod is the most helpful. I'm onto something great here!

Change 483035 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] [WIP] Helm chart for eventgate-analytics deployment

https://gerrit.wikimedia.org/r/483035

I assume they will be behind varnish so they can take advantage of request level throttling and similar implemented at the varnish layer.

Sure the analytics one will not be exposed bare to the internet, I meant it will have endpoints that are available to the public, so in theory, it could be DDOSed.

I think we will be good with tier-1 and tier-2 installs, however since both will be in kubernetes, thus running on the same physical hardware, they will not be entirely separated as our Kafka clusters are. But I suppose it should be ok, but I am not quite sure how the isolation of different support tier services will work in kubernetes.

There's a number of ways to make a workload have a higher priority. Some are more ready that others. Namely priority and preemption [1] is not yet (it's beta in 1.11, we are at 1.10 currently) whereas affinity and anti-affinity[2] is more ready. Resource quotas[3] are also ready and we already use them. But note that there is no high level first class resource for representing"tier" (I don't love the term btw) . We will have to figure out what it means for us and implement it respectively.

To deploy

  1. wait for puppet and use scap-helm etc…
  2. Create namespace in staging (?)

We 'll be doing that using the initialize_namespace.sh script and it will in production cluster as well.

  1. deploy chart to k8s cluster(s) with scap-helm

@Pchelolo, hmmmm. eventgate in prod will need to have the event-schemas repo(s) available somehow. I'm working on getting the docker images and helm charts figured out. For the initial deployment prototype, I'm considering just making a blubber and CI based docker image that will be included in the eventgate docker image somewhere. This will work for a trial, but will be a bit inflexible, since it will mean that a new schema will require a rebuild and redeploy of eventgate.

The same issue is more or less met by ORES where updating the models requires a redeploy

Perhaps we should consider deploying a quick and easy (non kubernetes based) schema registry? We could set up something simple via puppet: a webserver and a git clone ensure => latest of the event-schemas repo. Thoughts?

Care to elaborate on what exactly that schema registry is?

We could consider using https://kubernetes.io/docs/concepts/storage/volumes/ for that if it's easy to set up in our k8s, @akosiaris will know more.

Depending on the size and structure of the registry (I am not sure what it even is yet) we could indeed create a configMap[4] object and store the data there and have it be mounted in the container. It's valid as an approach and we already do it for the config.yaml file. But it won't scale well to large amounts of data and in fact the limit is 1MB. Other solutions like hostPath[5] would not have that limitation but will require deployments to the kubernetes hosts themselves which is ugly.

Regarding a little http server - if it's gonna only be used for a test deployment - I'm ok with that, but it will be a blocker for putting any production load on the deployed eventgate. We can not have a production service depend on something like this.

Agreed on premise, but noting that it depends on the implementation.

We can have the software reach out once during (re)-initialization to fetch data from some endpoint (HTTP(S) or otherwise). During that phase the readiness probe should return effectively a "not ready" message informing kubernetes to not route traffic to this pod. That would be fine as a pattern. More than that would probably cause problematic interactions.

@akosiaris In your doc , in the Start Up section, you mention 'Open grafana'. Can you elaborate here? :)
OH! Nevermind I see, that isn't an instruction...but a summary of what we are doing, never mind!

I 've updated the docs anyway to make it a tad more descriptive. Thanks for pointing it out!

[1] https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
[2] https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
[3] https://kubernetes.io/docs/concepts/policy/resource-quotas/
[4] https://kubernetes.io/docs/concepts/storage/volumes/#configmap
[5] https://kubernetes.io/docs/concepts/storage/volumes/#hostpath

Change 484498 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Add kafka-single-node chart for local development

https://gerrit.wikimedia.org/r/484498

@fselles, I'm not able to get the requirements.yaml repository to work. The only reason my symlink works is I think because the dependency will be looked for in charts/ by default. No matter what I put for repository, if I don't have the symlink in charts/, I get

Error: found in requirements.yaml, but missing in charts/ directory: kafka-single-node

FYI, the symlinked (or copied, or packaged .tgz) chart is necessary in the charts/ dir. Leaving it as a symlink since it is there for development.

Change 484498 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Add kafka-dev chart for local development

https://gerrit.wikimedia.org/r/484498

Change 483035 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Helm chart for eventgate-analytics deployment

https://gerrit.wikimedia.org/r/483035

Change 490080 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Update eventgate config.yaml template to use schema_base_uris key

https://gerrit.wikimedia.org/r/490080

Change 490080 merged by Ottomata:
[operations/deployment-charts@master] Update eventgate config.yaml template to use schema_base_uris key

https://gerrit.wikimedia.org/r/490080

Change 490078 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Add eventgate-analytics tokens

https://gerrit.wikimedia.org/r/490078

Mentioned in SAL (#wikimedia-operations) [2019-02-12T15:46:28Z] <akosiaris> create namespaces for eventgate-analytics on eqiad/codfw/staging cluster T211247 T213194

Change 490078 merged by Alexandros Kosiaris:
[operations/puppet@production] Add eventgate-analytics tokens

https://gerrit.wikimedia.org/r/490078

Mentioned in SAL (#wikimedia-operations) [2019-02-12T16:18:08Z] <akosiaris> refresh kubernetes default egress policy T211247

Change 490351 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Always use stdout logger so we can see logs in pod

https://gerrit.wikimedia.org/r/490351

Change 490351 merged by Ottomata:
[operations/deployment-charts@master] Always use stdout logger so we can see logs in pod

https://gerrit.wikimedia.org/r/490351

Change 490352 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate-analytics: Use logstash.svc.eqiad.wmnet always

https://gerrit.wikimedia.org/r/490352

Change 490352 merged by Ottomata:
[operations/deployment-charts@master] eventgate-analytics: Use logstash.svc.eqiad.wmnet always

https://gerrit.wikimedia.org/r/490352

Change 490357 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - Add stream-config to checksumed files

https://gerrit.wikimedia.org/r/490357

Change 490357 merged by Ottomata:
[operations/deployment-charts@master] eventgate - Add stream-config to checksumed files

https://gerrit.wikimedia.org/r/490357

Ottomata added a subscriber: Joe.Feb 13 2019, 9:30 PM

Status:

I had to manually copy the schema repo into /etc/eventgate-ci/ on deployment-eventgate-analytics-1 so that the container could access schemas, but this will change when we have a remote schema registry and when the schemas are baked into the image.

@Pchelolo If you want to test multi endpoint / monolog stuff in beta, you can add your schemas manually to /etc/eventgate-ci/event-schemas/ on deployment-eventgate-analytics-1 and (if needed) sudo service eventgate-ci restart and POST events to http://deployment-eventgate-analytics-1.deployment-prep.eqiad.wmflabs:8192/v1/events.

Change 490418 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Add EventBus multi endpoint configuration and add eventgate-analytics endpoint

https://gerrit.wikimedia.org/r/490418

Change 491860 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/dns@master] [WIP] Set up DNS for eventgate-analytics

https://gerrit.wikimedia.org/r/491860

Change 491861 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Set up eventgate-analytics.discovery.wmnet

https://gerrit.wikimedia.org/r/491861

@akosiaris the above two patches are my best effort at copy/pasting stuff from mathoid to get ready for eventgate-analytics deployment. Lemme know if I'm doing it wrong :)

Ottomata raised the priority of this task from Normal to High.Feb 21 2019, 5:57 PM

Change 490418 merged by Ottomata:
[operations/mediawiki-config@master] Use EventBus multi endpoint configuration for eventbus configs

https://gerrit.wikimedia.org/r/490418

Change 492925 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Use schemas from docker image, configure api-request stream

https://gerrit.wikimedia.org/r/492925

Change 492925 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Use schemas from docker image, configure api-request stream

https://gerrit.wikimedia.org/r/492925

Change 491860 merged by Ottomata:
[operations/dns@master] Set up DNS for eventgate-analytics

https://gerrit.wikimedia.org/r/491860

jijiki added a subscriber: jijiki.Feb 27 2019, 2:58 PM

Change 491861 merged by Effie Mouzeli:
[operations/puppet@production] Set up eventgate-analytics.discovery.wmnet

https://gerrit.wikimedia.org/r/491861

Mentioned in SAL (#wikimedia-operations) [2019-02-27T16:49:34Z] <jijiki> Deploy LVS for eventgate-analytics - T211247

Change 493279 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Enable eventgate-analytics.discovery.wmnet

https://gerrit.wikimedia.org/r/493279

Change 493279 merged by Alexandros Kosiaris:
[operations/dns@master] Enable eventgate-analytics.discovery.wmnet

https://gerrit.wikimedia.org/r/493279

@Ottomata LVS config is done :) 😺

akosiaris closed this task as Resolved.Feb 28 2019, 6:33 AM

Resolving this, feel free to reopen

Yahoo thank you!

Krenair added a subscriber: Krenair.EditedMar 18 2019, 9:31 PM

(Please see T218609: Figure out future for newly created deployment-prep jessie instances regarding that deployment-prep instance)

akosiaris changed the status of subtask T213561: Discovery for Kafka cluster brokers from Open to Stalled.Apr 8 2019, 2:37 PM

Change 482867 abandoned by Ottomata:
WIP Serve event-schemas repo via http

Reason:
Did this in another patch

https://gerrit.wikimedia.org/r/482867