As part of our work to unify all DSE services to use single urls per service as opposed to hardcoded hosts, we need to add the druid-coordinator service to lvs and use a single svc rul for the service
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T403800 Use unified service urls for DPE services | |||
| Open | None | T403955 Switch all hard coded druid_public host urls to druid-public-coordinator svc url | |||
| Resolved | Gehel | T406222 Add druid coordinator service to LVS for the druid_public cluster. |
Event Timeline
Change #1198498 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] LVS: etcd data for druid-public-coordinator
Change #1198499 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] LVS: Add druid-public-coordinator to service list
Change #1198500 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/dns@master] DNS: Add druid-public-coordinator record
Change #1199256 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] druid: add druid-coordinator to druid public worker role
Change #1199763 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] LVS: set druid-coordinator to state lvs_setup
Change #1199764 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] LVS: set druid-coordinator to state production
We have had some delays due to scheduling conflicts and PTO. However, we have found some middle ground and have a slot anytime between 10:00 GMT and 12:00 GMT. for 5th Nov for the deploy.
Change #1198498 merged by Btullis:
[operations/puppet@production] LVS: etcd data for druid-public-coordinator
Change #1198499 merged by Btullis:
[operations/puppet@production] LVS: Add druid-public-coordinator to service list
Change #1199256 merged by Btullis:
[operations/puppet@production] druid: add druid-coordinator to druid public worker role
I've merged the first three patches on this stack:
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198498
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198499
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1199256
I'll wait until tomorrow to merge the next (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1199763) which sets the service into state: lvs_setup - I'll also do this with the help of the Traffic team to apply the changes to pybal.
Change #1198500 merged by Btullis:
[operations/dns@master] DNS: Add druid-public-coordinator record
Change #1199763 merged by Ssingh:
[operations/puppet@production] LVS: set druid-coordinator to state lvs_setup
Mentioned in SAL (#wikimedia-operations) [2025-11-20T18:27:00Z] <sukhe> sukhe@lvs1020:~$ sudo systemctl restart pybal.service: T406222
all backing servers have been marked as pooled:
gehel@cumin2002:~$ sudo confctl select service=druid-public-coordinator get
{"druid1011.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=druid-public,service=druid-public-coordinator"}
{"druid1012.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=druid-public,service=druid-public-coordinator"}
{"druid1013.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=druid-public,service=druid-public-coordinator"}
{"druid1009.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=druid-public,service=druid-public-coordinator"}
{"druid1010.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=druid-public,service=druid-public-coordinator"}@ssingh : I think the service is now ready. Could you help us move this to lvs_setup and production state? Is there something else missing from our side?
Change #1216793 had a related patch set uploaded (by Gehel; author: Gehel):
[operations/puppet@production] LVS: set druid-coordinator to state lvs_setup
Change #1216793 merged by Gehel:
[operations/puppet@production] LVS: set druid-coordinator to state lvs_setup
deploying following instructions in https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers
Mentioned in SAL (#wikimedia-operations) [2025-12-09T13:48:54Z] <gehel> sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T406222
Mentioned in SAL (#wikimedia-operations) [2025-12-09T13:53:42Z] <gehel> sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T406222
This seems to be working, sending an HTTP 307 redirect to one of the druid node:
gehel@cumin1003:~$ curl -v -k http://druid-public-coordinator.svc.eqiad.wmnet:8081 * Uses proxy env variable no_proxy == 'wikipedia.org,wikimedia.org,wikibooks.org,wikinews.org,wikiquote.org,wikisource.org,wikiversity.org,wikivoyage.org,wikidata.org,wikiworkshop.org,wikifunctions.org,wiktionary.org,mediawiki.org,wmfusercontent.org,w.wiki,wikimediacloud.org,wmnet,127.0.0.1,::1' * Trying 10.2.2.15:8081... * Connected to druid-public-coordinator.svc.eqiad.wmnet (10.2.2.15) port 8081 (#0) > GET / HTTP/1.1 > Host: druid-public-coordinator.svc.eqiad.wmnet:8081 > User-Agent: curl/7.88.1 > Accept: */* > < HTTP/1.1 307 Temporary Redirect < Date: Tue, 09 Dec 2025 13:55:21 GMT < Location: http://druid1009.eqiad.wmnet:8081/ < Content-Length: 0 < Server: Jetty(9.4.12.v20180830) < * Connection #0 to host druid-public-coordinator.svc.eqiad.wmnet left intact
Change #1216797 had a related patch set uploaded (by Gehel; author: Gehel):
[operations/puppet@production] LVS: set druid-coordinator to state production
Change #1216797 merged by Gehel:
[operations/puppet@production] LVS: set druid-coordinator to state production
HTTP calls to druid-public-coordinator.svc.eqiad.wmnet:8081 result in an HTTP 307 redirect. I'm assuming that this is expected and that clients will follow those redirects.