Why
Today, to analyze backend performance in production, we provide Arc Lamp with flamegraphs that accurately visualise the function-call hierarchy of the entire code base.
For the equivalent usecase during debugging, we currently have XHGui: https://wikitech.wikimedia.org/wiki/WikimediaDebug#XHGui_profiling (example):
XHGui provides solid information about memory use and function call count, which is useful and specific debugging. However, it completely lacks information about the function-call hierarchy, and lacks any accuracy in its timing information to understand speed.
This creates a significant gap in the debugging experience, basically limiting it to performance problems relating to memory usage, and database/storage calls; with no story for analyzing speed and latency.
Worse yet, XHGui deceptively provides hierarchal and timing data that (if interpreted as such) is inaccurate.
Goal
For this task, I'd like to achieve the following:
- Able to visualise a debug request as a fairly detailed and standard (bottom-up) flamegraph, and also as a reversed flamegraph.
- Able to share links to filtered views of said flamegraph.
- Able to combine multiple requests into one flamegraph.
Background
This would be an alternative to XHGui, which has a number of drawbacks:
- Non-trivial function call overhead. This makes the "time spent" in any given function heavily skewed towards functions that make many functions. Of course, some amount of function call overhead is "real", but there is non-trivial additional overhead added by XHProf. Thus if on a production request function "foo" and "bar" both take 100ms to complete, but "bar" internally calls many functions, than in an XHProf report it will appear as if "bar" is much slower than it actually is. (Well, its true that it was that slow, but only when XHProf is active.)
- No call tree. For something self-describing as a "hierarchical profiler", XHProf has suprisingly little "hierarchical" information. It only measures data for each "parent-child" function name pair.
- It offers no visualisation. Naturally, per previous point.
- It is not meant to aggregate multiple requests, and we currently don't. (In theory we could buffer data on mwdebug and merge multiple datasets before forwarding to the XHGui database with faked request metadata).
For example, the following call tree:
* main * A1 * B * C * A2 * B * C
xhprof-tideways will report the time flattened as:
* main * main-A1 * main-A2 * A1-B * A2-B * B-C
It's not possible (or rather it's hard, and involves guesswork) to figure out how much time of "A1" was spent in C.
The interface appears to let you "dig in" to lower calls, but every time you click a function name, you're really navigating sideways, not downward. Which means numbers don't add up, and this can be confusing even to people who use the tool regularly and know the tested code well.
Implementation
Figure out:
- Decide how to collect traces: Excimer.
- Decide where to store traces: Misc DB cluster, same as XHGui, via a new "excimer-ui-server" HTTP API.
- Decide where to generate or store flame graphs: No storage, generate in-browser via self-hosted Speedscope.
- Evaluate retention and possible abuse: Prune after insert in excimer-ui-server. Use hmac secret to avoid abuse that modifies exiting records. No rate limit at this time.
Next steps from T291015#8576699, @Krinkle wrote:
- client: Rename Client\Profiler to Client\ExcimerClient.
- client: Add "secret" option to prevent abuse, make ingestion IDs different from real/public/read-only IDs.
- DBA: Request a misc DB for excimer-ui-server. T331956: Create "excimer" misc database
- mediawiki: Package php-exicmer 1.1.0 or later and depoy it. T332964: Upgrade php-excimer package from 1.0.4 to 1.1.1
- perf site: Enable PHP on perf.wm.o.
- perf site: Use Puppet to provision a JSON config file with db credentials, set ENV in apache for excimer-ui-server discovery.
- perf site: Check in excimer-ui-server with vendor.
- mediawiki: Add ExcimerClient to wmf-config/lib and modify wmf-config/Profiler to permit its use when using WikimediaDebug. Trigger without need for query param via XWD attribute.
- WikimediaDebug: Add frontend option browser extension.