Currently our visiblity into failure to contact the search cluster isn't great. We can do:
fluorine:/a/mw-log# grep 'Search backend error' /a/mw-log/hhvm.log | grep 'Operation timed out' | wc -l
15
But it would be infinitely better if we could begin tracking this in graphite as we will be able to reference the user impacting effects of our upgrade process and general cluster health.