So far, we have been loosely using time or ab -n for detecting how performance changes. But, we need a script that does this a bit more systematically.
Loose set of TODOs:
- Identify a set of titles that correspond to p50, mean, p95, p99 parse times on the production cluster
- I/O from m/w api requests (especially on non-production cluster, like our dev machines) can introduce noisiness into this. So, the script should do an initial run with the --record option of the parse.js script. The benchmarking runs should then use --replay option to read m/w api results from the recorded responses.
- In order to eliminate noise from startup costs, the benchmarking script should spawn Parsoid as a server, and do a couple warmup requests to ensure JIT effects are eliminated, and gather perf data from N runs.
- In order to support 3, we may need to add a request header to the api requests to only return perf data (time trace + total parse time / html2wt with components) as a JSON blob. Maybe only when parsoidConfig.devAPI is enabled.