eventgate helm chart should use common_templates _tls_helpers.tpl instead of its own custom copy
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	Ottomata
	Sep 21 2021, 3:59 PM

Details

	Subject	Repo	Branch	Lines +/-
	Update eventgate helmfile.d for eventgate 0.5 chart	operations/deployment-charts	master	+4 -36
	Eventgate: Symlink _helpers and _tls_helpers	operations/deployment-charts	master	+14 -18

Customize query in gerrit

Related Objects

Mentioned In: T291856: Enable envoy tls proxy logging from eventgate
Mentioned Here: T291848: Clarify common k8s label and service conventions in our helm charts
T282148: Support Canary releases on Kubernetes

Event Timeline

Ottomata created this task.Sep 21 2021, 3:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 21 2021, 3:59 PM

Change 722654 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[operations/deployment-charts@master] Eventgate: Symlink _helpers and _tls_helpers

https://gerrit.wikimedia.org/r/722654

gerritbot added a project: Patch-For-Review.Sep 21 2021, 4:22 PM

Change 722935 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[operations/deployment-charts@master] Update eventgate helmfile.d for eventgate 0.5 chart

https://gerrit.wikimedia.org/r/722935

Change 722654 merged by Ottomata:

[operations/deployment-charts@master] Eventgate: Symlink _helpers and _tls_helpers

https://gerrit.wikimedia.org/r/722654

Change 722935 merged by Ottomata:

[operations/deployment-charts@master] Update eventgate helmfile.d for eventgate 0.5 chart

https://gerrit.wikimedia.org/r/722935

Maintenance_bot removed a project: Patch-For-Review.Sep 23 2021, 3:10 PM

We tried to deploy this today, but ran into an issue: Since the k8s resources have been renamed, k8s thinks the e.g Service is new, but sees the old Service on the same port, causing a port conflict.

To deploy, we are going to have to depool a DC, delete the existing deployment, apply the new one, then repool.

I'd like to talk with someone about https://phabricator.wikimedia.org/T282148#7373078 before we do, to make sure we don't have to do that kind of failover deployment more than once.

To deploy, we are going to have to depool a DC, delete the existing deployment, apply the new one, then repool.

Since it's such a hassle, should we do the same thing for event streams chart first?

Oof right. I've already merged the eventgate chart change, and I think to rollback we'd have to revert and then bump the chart version to 0.6.0.

Grr, I guess we should rollback, right?

why rollback? we just make the same changes to eventstreams before going through the deployment

I'm worried that in the meantime someone will need to make an emergency fix/change to eventgate and won't be able to because of this.

oh, yeah. ok. up to you.

I think we should just proceed with eventgate, I'll do staging in each first. Will have to delete in staging and redeploy.

Plan for staging:

helmfile -e staging destroy
# wait and make sure all is gone.
helmfile -e staging apply

For eqiad and codfw (from https://wikitech.wikimedia.org/wiki/DNS/Discovery#How_to_manage_a_DNS_Discovery_service)

First lower DNS ttl for the eventgate deployment:

puppetmaster1001 $ sudo confctl --object-type discovery select 'dnsdisc=eventgate-logging-external' set/ttl=10

Deploy in codfw:

puppetmaster1001$ sudo confctl --object-type discovery select 'dnsdisc=eventgate-logging-external,name=codfw' set/pooled=false
# make sure codfw is depooled
puppetmaster1001$ sudo confctl --quiet --object-type discovery select 'dnsdisc=eventgate-logging-external' get

# delete and re-deploy in codfw
deploy1002$ helmfile -e codfw destroy
deploy1002$ helmfile -e codfw apply
# Wait for k8s service to look good and re pool

puppetmaster1001$ sudo confctl --object-type discovery select 'dnsdisc=eventgate-logging-external,name=codfw' set/pooled=true
# make sure codfw is depooled
puppetmaster1001$ confctl --quiet --object-type discovery select 'dnsdisc=eventgate-logging-external' get
# make sure both eqiad and codfw are pooled

Repeat this process for eqiad.

Reset DNS ttl for the eventgate deployment:

puppetmaster1001 $ sudo confctl --object-type discovery select 'dnsdisc=eventgate-logging-external' set/ttl=300

Repeat all of this for each eventgate deployment.

Don't forget to wait for the DNS TTL and/or lower the TTL before every depool/repool operation.

so you might want to first do

puppetmaster1001 $ sudo confctl --object-type discovery select 'dnsdisc=eventgate-logging-external' set/ttl=10

And then re-set it to 300 once you're done with your work.

Thanks, added this step into my comment above.

Mentioned in SAL (#wikimedia-operations) [2021-09-27T13:58:39Z] <ottomata> beginning re-deploy of eventgate-logging-external - https://phabricator.wikimedia.org/T291504#7380252

Ah, there were some mistakes in our patches: the tls Service wasn't using the same label selectors that the pods had. Reverting for now.

Let's discuss T291848: Clarify common k8s label and service conventions in our helm charts a little more before proceeding.

Mentioned in SAL (#wikimedia-operations) [2021-09-27T16:10:45Z] <ottomata> reverting eventgate-logging-external chart change in codfw - T291504

odimitrijevic moved this task from Incoming to Event Platform on the Analytics board.Sep 27 2021, 4:13 PM

Ottomata mentioned this in T291856: Enable envoy tls proxy logging from eventgate.Sep 27 2021, 5:27 PM

@Ottomata Should we remove this task from Analytics to Data Engineering?

Restricted Application added a project: Data-Engineering. · View Herald TranscriptJul 6 2022, 9:46 PM

Yes.

BTullis subscribed.Jul 15 2022, 11:34 PM

Ottomata moved this task from Backlog to To be Estimated/To be discussed on the Event-Platform board.Aug 11 2022, 6:00 PM

Ottomata closed this task as a duplicate of T303543: eventgate chart should use common_templates.Aug 15 2022, 2:34 PM

eventgate helm chart should use common_templates _tls_helpers.tpl instead of its own custom copyClosed, DuplicatePublicActions

Details

Related Objects

Event Timeline

eventgate helm chart should use common_templates _tls_helpers.tpl instead of its own custom copy
Closed, DuplicatePublic
Actions