Page MenuHomePhabricator

Services downstream from RESTBase should not separately alert due to RESTBase production monitoring timeouts
Closed, DeclinedPublic

Description

When RESTBase fails production monitoring checks due to timeouts, a number of other services downstream from it also alert; typically these include at least the Page Content Service (mobileapps) and the Recommendation API service (recommendation-api). These downstream services should not also alert because that creates noise and makes it harder to identify the true cause of instability.

Proposal: In downstream services, maintain HTML fixtures for monitoring requests that are used in lieu of /page/html requests to RESTBase when a monitoring request parameter is set.

Event Timeline

mobrovac subscribed.

RESTBase (and other services) are alerting because we want to be sure that we catch problems visible by external users. While it does sometimes introduce false positives (the inverse is true too - RB alerts if back-end services alert), it is better to have more alerts than none due to convoluted logic or fixtures.