With the new Mann Whiney compare setup, we can more easily spot regression.
However the problem at the moment is that you don't know so much about the baseline test.
On the compare page, add some more data so we can compare the runs with metrics that do not run through the Mann Whitney tests:
- number of requests
- sizes
- screenshot
- video
- link to baseline run