There have been many recent incidents for WDQS being overloaded . The band-aid fix is to restart the WDQS service on all hosts across the entire datacenter. While SREs have access to do this via cumin, SWEs do not. An ansible playbook could help SWEs (who do have shell access to the WDQS servers) restart the service safely and quickly.
Creating this ticket to:
- Write an ansible playbook to restart services for WDQS
- Train SWEs on its use