Our Apache configs are hard to understand, which makes it difficult to make changes with confidence that the result is as intended. An erroneous Apache config change can cause a global outage.
Currently, we use apache-fast-test to make a rough sanity check on a running httpd by sending it requests for a hardcoded list of URLs. Its effectiveness is limited, though: an SRE deploying a config change must run apache-fast-test manually after deploying to one host, and (because the tool is only configured with the list of URLs and no other information) must visually inspect the output and check that it looks correct.
The intended replacement, tentatively named httpbb (for "HTTP Black Box testing") will have a more expressive config: instead of a list of URLs, it's configured with a list of test cases. Each consists of a URL, and optionally other request parameters like arbitrary headers, followed by one or more assertions about the response (e.g. specific response codes; expected header values; a regular expression that the body text should match). httpbb will check each of those assertions against the actual response, emitting a pass-fail result. That will let us write a much more extensive list of test cases, building up much better test coverage of the full Apache config -- without making it more difficult to manually verify the results. In turn, that will let us make config changes with greater confidence, including simplifying refactors. Black-box testing also means the tests are agnostic to the underlying server technology, so they would also let us test for consistent behavior across reimages, upgrades and architecture changes.
In the initial workflow, an SRE will use httpbb manually in place of apache-fast-test: deploy a config change to one host, and run httpbb against that host. On a "pass" result, they'll continue with the deployment. In the long run, we may be able to automate that work away, and even use httpbb in CI for config changes.