This aims to switch all the services that reference single druid hosts like druid1007.equad.wmnet to use the public coordinator svc url to handle failover and to ease maintenance of druid hosts.
Description
Details
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| Revert to using a hostname for the druid_public services | repos/data-engineering/airflow-dags!1680 | stevemunene | replace_druid_public_host | main | |
| Change all druid_public host references in DAGs and datahub to use svc url | repos/data-engineering/airflow-dags!1653 | stevemunene | replace_druid_public_hosts | main |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T403800 Use unified service urls for DPE services | |||
| Open | BTullis | T403955 Switch all hard coded druid_public host urls to druid-public-coordinator svc url | |||
| Resolved | Gehel | T406222 Add druid coordinator service to LVS for the druid_public cluster. |
Event Timeline
Change #1185922 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] Change all druid_public hosts references to use svc url
stevemunene updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1653
Change all druid_public host references in DAGs and datahub to use svc url
Change #1185922 merged by Stevemunene:
[operations/puppet@production] Change all druid_public hosts references to use svc url
stevemunene merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1653
Change all druid_public host references in DAGs and datahub to use svc url
Monitoring the state of the druid related changes for a while before marking this as resolved.
Change #1188312 had a related patch set uploaded (by Btullis; author: Btullis):
[operations/puppet@production] Revert to using a hostname for the druid_poublic coordinator
Change #1188312 merged by Btullis:
[operations/puppet@production] Revert to using a hostname for the druid_poublic coordinator
Re opening this task because we have recently found that the druid-coordinator service is not load balanced thus causing timeouts with the data-purge jobs. Only the druid broker service is load balanced, we are exploring adding the coordinator service to the load balanced services.
stevemunene opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1680
Revert to using a hostname for the druid_public services
stevemunene merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1680
Revert to using a hostname for the druid_public services
From discussions on this we decided to add the druid-coordinator service for the public cluster to LVS and get a single usable url for the service. However there were concerns on the need for this as the druid host changes rarely occur ie.(every 3 years or per server lifecycle).
Moreover, there are some discussions on having druid on k8s at some point in the future.
Change #1200034 had a related patch set uploaded (by Stevemunene; author: Stevemunene):
[operations/puppet@production] druid: switch to using the druid-public-coordinator url