Page MenuHomePhabricator

Switch all hard coded druid_public host urls to druid-public-coordinator svc url
Open, Needs TriagePublic

Description

This aims to switch all the services that reference single druid hosts like druid1007.equad.wmnet to use the public coordinator svc url to handle failover and to ease maintenance of druid hosts.

Event Timeline

Change #1185922 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Change all druid_public hosts references to use svc url

https://gerrit.wikimedia.org/r/1185922

stevemunene updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1653

Change all druid_public host references in DAGs and datahub to use svc url

Change #1185922 merged by Stevemunene:

[operations/puppet@production] Change all druid_public hosts references to use svc url

https://gerrit.wikimedia.org/r/1185922

stevemunene merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1653

Change all druid_public host references in DAGs and datahub to use svc url

Endpoints changed to druid-public-broker.svc.eqiad.wmnet

Monitoring the state of the druid related changes for a while before marking this as resolved.

No errors so far from the changed urls, we can close this.

Change #1188312 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Revert to using a hostname for the druid_poublic coordinator

https://gerrit.wikimedia.org/r/1188312

Change #1188312 merged by Btullis:

[operations/puppet@production] Revert to using a hostname for the druid_poublic coordinator

https://gerrit.wikimedia.org/r/1188312

Re opening this task because we have recently found that the druid-coordinator service is not load balanced thus causing timeouts with the data-purge jobs. Only the druid broker service is load balanced, we are exploring adding the coordinator service to the load balanced services.

From discussions on this we decided to add the druid-coordinator service for the public cluster to LVS and get a single usable url for the service. However there were concerns on the need for this as the druid host changes rarely occur ie.(every 3 years or per server lifecycle).
Moreover, there are some discussions on having druid on k8s at some point in the future.

Stevemunene renamed this task from Switch all hard coded druid_public host urls to druid-public broker svc url to Switch all hard coded druid_public host urls to druid-public-coordinator svc url.Oct 30 2025, 11:58 AM
Stevemunene updated the task description. (Show Details)

Change #1200034 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] druid: switch to using the druid-public-coordinator url

https://gerrit.wikimedia.org/r/1200034