Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Jelto | T292390 Upgrade all deployment charts to use the latest version of common_templates | |||
Resolved | gmodena | T310721 eventstreams chart should use latest common_templates |
Event Timeline
@Ottomata, has there been any progress on this one? Anything (e.g. reviews) we can help with?
Yar, no sorry, I have had zero time to work on this. @JArguello-WMF we should find a sprint to put this into.
/me is back. Any updates on this one (even if just a rough timeline) ? Anything we can help with?
Change 831957 had a related patch set uploaded (by Gmodena; author: Gmodena):
[operations/deployment-charts@master] charts:eventstreams bump common_templates and standardize labels
Change 831957 merged by jenkins-bot:
[operations/deployment-charts@master] charts:eventstreams bump common_templates and standardize labels
Change 833447 had a related patch set uploaded (by Gmodena; author: Gmodena):
[operations/deployment-charts@master] Bump eventstreams chart version.
Change 833447 merged by jenkins-bot:
[operations/deployment-charts@master] Bump eventstreams chart version.
@akosiaris, good news, Gabriele is working on this!!!
@Jelto @JMeybohm, it seems the upgrade to common_templates means that this will be a totally new k8s deployment, as the names of the resources have changed. @gmodena and I need some help doing this deployment, as a helmfile apply won't work. I think we need to delete the deployment and reinstall it.
We can test this in staging, but when we do this for any prod services, we should probably depool at the traffic routing later so we don't break any active usages. @gmodena will sync with you on this after we are all back from our respective offsites, (so early October :) ).
BTW, since we merged the helm chart changes, eventstreams is currently undeployable. We rarely deploy it anyway, so I don't think this is a problem, but we will need to revert if we want to do any eventstreams changes.
This needs SRE support to depool eventstreams from one DC. helmfile destroy/helmfile appy can be be done by deployers as well.
Hi, I'll be your SRE support for today, and will handle de/repooling, destroying the old chart, and applying the new.
For now, I'll check if destroy/apply works correctly in staging and report back.
Destroy/apply done in staging:
# helmfile -e staging status helmfile.yaml: basePath=. Getting status production NAME: production LAST DEPLOYED: Wed Oct 12 10:01:41 2022 NAMESPACE: eventstreams STATUS: deployed REVISION: 1 NOTES: Thank you for installing eventstreams. [...] # helm3 -n eventstreams history production REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 1 Wed Oct 12 10:01:41 2022 deployed eventstreams-0.5.0 Install complete # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINE SS GATES eventstreams-production-6f4fd9bc49-6kwgx 2/2 Running 0 10m 10.64.75.225 kubestage1004.eqiad.wmnet <none> <none>
Mentioned in SAL (#wikimedia-operations) [2022-10-12T10:33:24Z] <claime> depooling eventstreams in codfw - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T10:57:07Z] <claime> redeploying eventstreams codfw - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:02:07Z] <claime> repooled eventstreams in codfw - T310721
@Clement_Goubert Thank you so much! Please let us know if there is anything we need to do on our side
Wish you a great day!
eventstream redeployed in codfw.
@JArguello-WMF Apart from checking everything is still right after redeployment, do you or @Ottomata happen to know if eventstreams-internal is still used?
Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:24:02Z] <claime> depooling eventstreams in eqiad - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:44:06Z] <claime> redeploying eventstreams eqiad - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T11:50:12Z] <claime> repooling eventstreams in eqiad - T310721
Everything looks healthy from my end, both are getting traffic and not throwing errors according to the service dashboard
eventstreams-internal is still used?
I am not sure! I'd imagine folks use it, as it is a really nice GUI to view any of our streams, which is nice for quick testing and debugging. However, the ssh tunnel needed to connect to it might be a barrier enough that causes no one to use it. It is just as easy for me to use a CLI client, so I don't use it much.
For now, we should probably just redeploy it too.
I suppose could consider removing it in another task.
Mentioned in SAL (#wikimedia-operations) [2022-10-12T14:57:07Z] <claime> depooling eventstreams-internal codfw - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:07:03Z] <claime> redeploying eventstreams-internal codfw - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:09:34Z] <claime> repooled eventstreams-internal codfw - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:16:48Z] <claime> depooling eventstreams-internal eqiad - T310721
Mentioned in SAL (#wikimedia-operations) [2022-10-12T15:23:29Z] <claime> redeploying eventstreams-internal eqiad - T310721