Page MenuHomePhabricator

Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo)
Closed, ResolvedPublic

Description

stream.wikimedia.org is a public eventstreams instance, and as such only exposes an explicitly declared list of streams that do not have PII.

We recently added a nice browser UI to eventstreams. https://stream.wikimedia.org/v2/ui. It'd be handy to have an internal instance deployed to make viewing and debugging streams easier. We'd have to ssh tunnel to that instance (unless we put it behind SSO?), but even that would be nice.

Since an eventstreams instance can only be connected to a single Kafka cluster, we can only expose streams that exist in our 'aggregate' kafka cluster: jumbo-eqiad (which is not cross DC).

Event Timeline

Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)

Change 644612 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Add new service eventstreams-internal

https://gerrit.wikimedia.org/r/644612

fdans triaged this task as Medium priority.Dec 10 2020, 5:53 PM
fdans moved this task from Incoming to Event Platform on the Analytics board.
fdans added a project: Analytics-Kanban.

Change 655879 had a related patch set uploaded (by Elukey; owner: Elukey):
[labs/private@master] Add eventstreams-internal k8s service dummy token config

https://gerrit.wikimedia.org/r/655879

I am following https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service, will file the other changes for puppet private/public repos and then I'll follow up on the helm changes :)

Change 655879 merged by Elukey:
[labs/private@master] Add eventstreams-internal k8s service dummy token config

https://gerrit.wikimedia.org/r/655879

Change 656129 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add deployment config for a new k8s service - eventstreams-internal

https://gerrit.wikimedia.org/r/656129

Change 656129 merged by Elukey:
[operations/puppet@production] Add deployment config for a new k8s service - eventstreams-internal

https://gerrit.wikimedia.org/r/656129

Change 644612 merged by Elukey:
[operations/deployment-charts@master] Add new service eventstreams-internal

https://gerrit.wikimedia.org/r/644612

Get up to deploying the service in staging, it seems working! Updated all the steps in https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service

Oo we'll also want eventstreams-internal.svc.* LVS set up too.

@elukey it works! I realized that since this service is not proxied via varnish/ATS, we don't have an X-Client-IP header set, which external eventstreams requires and uses for throttling the user. I'll make a fix to the repo for this. You can proceed with deployment to eqiad and codfw clusters.

BTW, we don't need 2 replicas in staging, so we should add a values-staging.yaml and set replicas: 1 there.

Change 657852 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventstreams - move client_ip_connection_limit setting to helmfile values

https://gerrit.wikimedia.org/r/657852

Change 657852 merged by Ottomata:
[operations/deployment-charts@master] eventstreams - move client_ip_connection_limit setting to helmfile values

https://gerrit.wikimedia.org/r/657852

Change 657860 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventstreams-internal - only 1 replica needed in staging

https://gerrit.wikimedia.org/r/657860

Change 657860 merged by Ottomata:
[operations/deployment-charts@master] eventstreams-internal - only 1 replica needed in staging

https://gerrit.wikimedia.org/r/657860

Created all the TLS certs and configs as described in https://wikitech.wikimedia.org/wiki/Enable_TLS_for_Kubernetes_deployments#Create_and_place_certificates and ran puppet on deploy1001. In theory if everything is fine I should be now able to proceed with:

cd /srv/deployment-charts/helmfile.d/services/eventstreams-internal; helmfile -e production -i apply on deploy1001

Waiting for @JMeybohm's greenlight before proceeding :)

Waiting for @JMeybohm's greenlight before proceeding :)

Looks good to me, go ahead! (But it's "helmfile -e [codfw|eqiad] ..." 🙂 )

Waiting for @JMeybohm's greenlight before proceeding :)

Looks good to me, go ahead! (But it's "helmfile -e [codfw|eqiad] ..." 🙂 )

I was definitely going for eqiad/codfw but I wanted to know if you were paying attention! thanks :)

Error: pods is forbidden: User "eventstreams-internal" cannot list resource "pods" in API group "" in the namespace "eventstreams-internal"

This is the nice error msg that I got after running helmfile, going to check what the problem is.

You probably have not yet deployed the admin part (the new namespace etc.) to codfw/eqiad.

You probably have not yet deployed the admin part (the new namespace etc.) to codfw/eqiad.

Yes I realized it after adding the comment, I deployed the admin part only for staging of course. So I should proceed with:

ssh deploy1001
sudo -i; cd /srv/deployment-charts/helmfile.d/admin/codfw/; kube_env admin codfw; ./cluster-helmfile.sh -i apply
sudo -i; cd /srv/deployment-charts/helmfile.d/admin/eqiad/; kube_env admin eqiad; ./cluster-helmfile.sh -i apply

Then retry with the deploy. Does it sound ok?

Apart from you testing my attention again (kube_env admin [codfw|eqiad]), this looks good, yes :-)

es-internal deployed in both eqiad and codfw, next steps are:

  • test locally on kubernetes nodes with @Ottomata
  • add LVS VIPs in front of the services

@elukey app logs look normal; it'll be easier to test once LVS is up! :D Let's go!

Change 661067 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add conftool data for eventstreams-internal (new VIP)

https://gerrit.wikimedia.org/r/661067

Change 661071 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add eventstreams-internal to service_catalog

https://gerrit.wikimedia.org/r/661071

Change 661072 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::kubernetes::worker: add empty stanza for eventstreams-internal

https://gerrit.wikimedia.org/r/661072

I have followed https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service, first steps have code reviews opened, going to wait for reviewers before proceeding :)

Change 661386 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Add eventstreams-internal VIP DNS config

https://gerrit.wikimedia.org/r/661386

Change 661067 merged by Elukey:
[operations/puppet@production] Add conftool data for eventstreams-internal (new VIP)

https://gerrit.wikimedia.org/r/661067

Change 661386 merged by Elukey:
[operations/dns@master] Add eventstreams-internal VIP DNS config

https://gerrit.wikimedia.org/r/661386

Change 661071 merged by Elukey:
[operations/puppet@production] Add eventstreams-internal to service_catalog

https://gerrit.wikimedia.org/r/661071

Change 661072 merged by Elukey:
[operations/puppet@production] role::kubernetes::worker: add empty stanza for eventstreams-internal

https://gerrit.wikimedia.org/r/661072

Next step is https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers, going to do it tomorrow with Valentin since we'll need to restart some Pybals :)

Change 661687 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set the lvs_setup flag for eventstreams-internal

https://gerrit.wikimedia.org/r/661687

Change 661687 merged by Elukey:
[operations/puppet@production] Set the lvs_setup flag for eventstreams-internal

https://gerrit.wikimedia.org/r/661687

Mentioned in SAL (#wikimedia-operations) [2021-02-04T10:05:47Z] <elukey> restart pybal on lvs2010 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - T269160

Mentioned in SAL (#wikimedia-operations) [2021-02-04T10:08:53Z] <elukey> restart pybal on lvs1016 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - T269160

Mentioned in SAL (#wikimedia-operations) [2021-02-04T10:13:31Z] <elukey> restart pybal on lvs2009 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - T269160

Mentioned in SAL (#wikimedia-operations) [2021-02-04T10:15:07Z] <elukey> restart pybal on lvs1015 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - T269160

Change 661695 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set the monitoring_setup flag for eventstreams-internal

https://gerrit.wikimedia.org/r/661695

Change 661695 merged by Elukey:
[operations/puppet@production] Set the monitoring_setup flag for eventstreams-internal

https://gerrit.wikimedia.org/r/661695

Change 661697 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set the production flag for eventstreams-internal

https://gerrit.wikimedia.org/r/661697

Change 661697 merged by Elukey:
[operations/puppet@production] Set the production flag for eventstreams-internal

https://gerrit.wikimedia.org/r/661697

Change 661702 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set desired state for eventstreams-internal dns-disc

https://gerrit.wikimedia.org/r/661702

Change 661702 merged by Elukey:
[operations/puppet@production] Set desired state for eventstreams-internal dns-disc

https://gerrit.wikimedia.org/r/661702

Tested with ssh -L 4992:eventstreams-internal.discovery.wmnet:4992 -N mwmaint1002.eqiad.wmnet + https://localhost:4992 on the browser. I can see the UI and the streams!

@Ottomata to confirm that we are done :)