We would have detected the commit that caused T263832 a lot more easily if the tested article was purged at the beginning of each synthetic test run. This would ensure that any changes to the PHP that might affect the rendered HTML are picked up immediately. For that you need to have the following steps at the beginning of each series of tests:
- Purging the article (eg. POST request to https://en.m.wikipedia.org/w/index.php?title=Facebook&action=purge)
- Unrecorded page load, to make the parser re-parse the article
- [Regular runs start here]
I would argue that it would be worth doing for production tests as well, as it would also make regressions caused by HTML correlate in time with deployments, instead of having to wait for the next edit to naturally purge an article.
For regressions caused by HTML this would allow us to pinpoint the commit that caused it exactly based on when things regressed on Beta. And to the deployment exactly if done in production.