We have been using hard coded urls for most of the DPE services ie Druid hosts etc. which bring some unexpected down times during host maintenance or reimaging. We aim to introduce sturdier mechanisms to ensure better uptimes for our hosts incase of any event. In the form of high-availability and managed failover mechanisms.
An example for this is, we have several hardcoded druid hosts that need to be manually changed every time we have some maintenance work which at times leads to some human errors with hosts being left out and causing some downtime.
The druid_public cluster is on high availability (LVS) under druid-public-broker.svc.eqiad.wmnet. The first step shall be changing al hardcoded instances to use the svc url for the public cluster as we work on availing the same for the analytics_druid cluster. This has been previously discussed here T288750 and would also provide the chance for use to investigate further options for T360769.
These are the services we need to update in our code.