Page MenuHomePhabricator

eventstreams chart should use latest common_templates
Closed, ResolvedPublic

Event Timeline

@Ottomata, has there been any progress on this one? Anything (e.g. reviews) we can help with?

Yar, no sorry, I have had zero time to work on this. @JArguello-WMF we should find a sprint to put this into.

Hi @Ottomata, @JArguello-WMF

/me is back. Any updates on this one (even if just a rough timeline) ? Anything we can help with?

Change 831957 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/deployment-charts@master] charts:eventstreams bump common_templates and standardize labels

https://gerrit.wikimedia.org/r/831957

Change 831957 merged by jenkins-bot:

[operations/deployment-charts@master] charts:eventstreams bump common_templates and standardize labels

https://gerrit.wikimedia.org/r/831957

Change 833447 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/deployment-charts@master] Bump eventstreams chart version.

https://gerrit.wikimedia.org/r/833447

Change 833447 merged by jenkins-bot:

[operations/deployment-charts@master] Bump eventstreams chart version.

https://gerrit.wikimedia.org/r/833447

@akosiaris, good news, Gabriele is working on this!!!

@Jelto @JMeybohm, it seems the upgrade to common_templates means that this will be a totally new k8s deployment, as the names of the resources have changed. @gmodena and I need some help doing this deployment, as a helmfile apply won't work. I think we need to delete the deployment and reinstall it.

We can test this in staging, but when we do this for any prod services, we should probably depool at the traffic routing later so we don't break any active usages. @gmodena will sync with you on this after we are all back from our respective offsites, (so early October :) ).

BTW, since we merged the helm chart changes, eventstreams is currently undeployable. We rarely deploy it anyway, so I don't think this is a problem, but we will need to revert if we want to do any eventstreams changes.

This needs SRE support to depool eventstreams from one DC. helmfile destroy/helmfile appy can be be done by deployers as well.

Clement_Goubert subscribed.

Hi, I'll be your SRE support for today, and will handle de/repooling, destroying the old chart, and applying the new.

For now, I'll check if destroy/apply works correctly in staging and report back.

Destroy/apply done in staging:

# helmfile -e staging status
helmfile.yaml: basePath=.
Getting status production
NAME: production
LAST DEPLOYED: Wed Oct 12 10:01:41 2022
NAMESPACE: eventstreams
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing eventstreams.
[...]
# helm3 -n eventstreams history production
REVISION        UPDATED                         STATUS          CHART                   APP VERSION     DESCRIPTION     
1               Wed Oct 12 10:01:41 2022        deployed        eventstreams-0.5.0                      Install complete
# kubectl get pods -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP             NODE                        NOMINATED NODE   READINE
SS GATES                                                                                                                                         
eventstreams-production-6f4fd9bc49-6kwgx   2/2     Running   0          10m   10.64.75.225   kubestage1004.eqiad.wmnet   <none>           <none>

Mentioned in SAL (#wikimedia-operations) [2022-10-12T10:33:24Z] <claime> depooling eventstreams in codfw - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T10:57:07Z] <claime> redeploying eventstreams codfw - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:02:07Z] <claime> repooled eventstreams in codfw - T310721

@Clement_Goubert Thank you so much! Please let us know if there is anything we need to do on our side
Wish you a great day!

eventstream redeployed in codfw.
@JArguello-WMF Apart from checking everything is still right after redeployment, do you or @Ottomata happen to know if eventstreams-internal is still used?

Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:24:02Z] <claime> depooling eventstreams in eqiad - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:44:06Z] <claime> redeploying eventstreams eqiad - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:50:12Z] <claime> repooling eventstreams in eqiad - T310721

Everything looks healthy from my end, both are getting traffic and not throwing errors according to the service dashboard

 eventstreams-internal is still used?

I am not sure! I'd imagine folks use it, as it is a really nice GUI to view any of our streams, which is nice for quick testing and debugging. However, the ssh tunnel needed to connect to it might be a barrier enough that causes no one to use it. It is just as easy for me to use a CLI client, so I don't use it much.

For now, we should probably just redeploy it too.

I suppose could consider removing it in another task.

Mentioned in SAL (#wikimedia-operations) [2022-10-12T14:57:07Z] <claime> depooling eventstreams-internal codfw - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:07:03Z] <claime> redeploying eventstreams-internal codfw - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:09:34Z] <claime> repooled eventstreams-internal codfw - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:16:48Z] <claime> depooling eventstreams-internal eqiad - T310721

Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:23:29Z] <claime> redeploying eventstreams-internal eqiad - T310721

eventstreams-internal fully redeployed, this task can probably be closed now.