We've offered XHGui for a while now, and it provides a lot of information. But, it also has some shortcomings that we could address by leveraging other tooling we already have in place. Specifically, php-excimer and FlameGraph.
For this task, I'd like to achieve the following:
1. Able to visualise a debug request as a fairly detailed and standard (bottom-up) flamegraph, and also as a reversed flamegraph.
2. Able to share links to filtered views of said flamegraph.
3. Able to combine multiple requests into one flamegraph.
This would be an alternative to XHGui, which has a number of drawbacks:
* Non-trivial function call overhead. This makes the "time spent" in any given function heavily skewed towards functions that make many functions. Of course, some amount of function call overhead is "real", but there is non-trivial additional overhead added by XHProf. Thus if on a production request function "foo" and "bar" both take 100ms to complete, but "bar" internally calls many functions, than in an XHProf report it will appear as if "bar" is much slower than it actually is. (Well, its true that it was that slow, but only when XHProf is active.)
* No call tree. For something self-describing as a "hierarchical profiler", XHProf has suprisingly little "hierarchical" information. It only measures data for each "parent-child" function name pair.
* It offers no visualisation. Naturally, per previous point.
* It is not meant to aggregate multiple requests, and we currently don't. (In theory we could buffer data on mwdebug and merge multiple datasets before forwarding to the XHGui database with faked request metadata).
For example, the following call tree:
xhprof-tideways will report the time flattened as:
It's not possible (or rather it's hard, and involves guesswork) to figure out how much time of "A1" was spent in C.
The interface appears to let you "dig in" to lower calls, but every time you click a function name, you're really navigating sideways, not downward. Which means numbers don't add up, and this can be confusing even to people who use the tool regularly and know the tested code well.
* [x] Decide how to collect traces: Excimer.
* [x] Decide where to store traces: Misc DB cluster, same as XHGui, via a new "excimer-ui-server" HTTP API.
* [x] Decide where to generate or store flame graphs: No storage, generate in-browser via self-hosted Speedscope.
* [x] Evaluate retention and possible abuse: Prune after insert in excimer-ui-server. Use hmac secret to avoid abuse that modifies exiting records. No rate limit at this time.
Next steps from T291015#8576699, @Krinkle wrote:
* [x] client: Rename Client\Profiler to Client\ExcimerClient.
* [x] client: Add "secret" option to prevent abuse, make ingestion IDs different from real/public/read-only IDs.
* [ ] DBA: Request a misc DB for excimer-ui.
* [ ] perf site: Enable PHP on perf.wm.o.
* [ ] perf site: Use Puppet to provision a JSON config file with db credentials, set ENV in apache for excimer-ui-server discovery.
* [ ] perf site: Check in excimer-ui-server with vendor.
* [ ] mediawiki: Add ExcimerClient to wmf-config/lib.
* [ ] mediawiki: Modify wmf-config/Profiler to permit its use when using WikimediaDebug. Trigger without need for query param via XWD attribute.
* [ ] mediawiki: Merge the footer link patch, especially for local dev where you'd trigger by query param without WikimediaDebug.
* [ ] WikimediaDebug: Add frontend option browser extension.