Page MenuHomePhabricator

Set up sampling profiler for PHP 7 (alternative to HHVM Xenon)
Closed, ResolvedPublic

Description

In Wikimedia production, we profile MediaWiki in two ways:

  • Request profiling for debugging – see T206152.
  • Sampling profiles for production - this task.

This is currently collected from all production servers/requests via the internal HHVM sampling profiler ("Xenon") and periodically collected from HHVM's parent process.

Links:

The Xenon sampling profiler is enabled by default for us in production for all requests. There is no run-time toggle. During request shutdown, we ask HHVM for data. The data is internally buffered by HHVM which means it usually doesn't return anything yet, and then at certain intervals it yields a buffer of what it's got since the last flush, which we send to a Redis queue from which we then produce aggregated log files and flame graphs, published at <https://performance.wikimedia.org/.

High-level tasks:

  • Find or create a sampling profiler for PHP 7. – T205059
  • Install the profiler in production.
  • Generalise the arc-lamp ingestion pipeline to support a second queue. – T195312
  • Set up Redis writer in wmf-config for Excimer (use a separate Redis topic).
  • Enable capturing and flushing of PHP 7 samples to this queue.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle updated the task description. (Show Details)Oct 3 2018, 6:18 PM
Krinkle renamed this task from Find PHP7 alternative for HHVM's Xenon to Set up sampling profiler for PHP 7 (alternative to HHVM Xenon).Oct 3 2018, 6:21 PM

Change 464381 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] profiler: Move flush to a function and prep beta/prod consolidate (beta-only)

https://gerrit.wikimedia.org/r/464381

Change 464382 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] profiler: Use wmfArcLampFlush() in prod

https://gerrit.wikimedia.org/r/464382

Change 464381 merged by jenkins-bot:
[operations/mediawiki-config@master] profiler: Move flush to a function and prep beta/prod consolidate (beta-only)

https://gerrit.wikimedia.org/r/464381

Mentioned in SAL (#wikimedia-operations) [2018-10-09T02:21:10Z] <krinkle@deploy1001> Synchronized wmf-config/arclamp.php: T176916 - Id79baae90: ensure file exists before Ie86e88777c48 (duration: 00m 57s)

Change 464382 merged by jenkins-bot:
[operations/mediawiki-config@master] profiler: Use wmfArcLampFlush() in prod

https://gerrit.wikimedia.org/r/464382

Mentioned in SAL (#wikimedia-operations) [2018-10-09T02:41:05Z] <krinkle@deploy1001> Synchronized wmf-config/profiler.php: T176916 / T206092 - Ie86e88777c48 (duration: 00m 56s)

Krinkle changed the task status from Open to Stalled.Dec 5 2018, 5:22 PM
Krinkle moved this task from Blocked to Next in FY1819Q3 (Jan-Mar) on the Performance-Team board.
Krinkle moved this task from Doing to Blocked on the Performance-Team board.

Mentioned in SAL (#wikimedia-operations) [2019-02-19T08:56:46Z] <_joe_> experimenting with php-fpm configuration on mwdebug1001 for T176916

Joe added a comment.EditedFeb 19 2019, 9:01 AM

I just ran a simple test to verify if profiling to xhgui works in PHP 7, after @Krinkle told me it doesn't.

And in fact it doesn't seem to work. On mwdebug1001 I verified that when using php7, no data is sent to the xhgui server.

So, something doesn't work as expected!

Digging a bit deeper, the error is mine (meh!). Apparently the tideways_xhprof extension has been moved to its own separate repository in preparation for version 5, while we're still using version 4.x; so what profiler.php does (checking for the tideways_xhprof extension) will not work on this version, which is btw the last stable release.

Joe added a comment.Feb 19 2019, 9:33 AM

So basically either we rewrite profiler.php to support the old tideways extension or (which I prefer at this point, tbh) we build just the xhprof component, which seems to have a smaller surface.

Mentioned in SAL (#wikimedia-operations) [2019-02-19T12:29:28Z] <_joe_> creating gerrit repo operations/debs/tideways-xhprof T176916

Mentioned in SAL (#wikimedia-operations) [2019-02-19T15:10:53Z] <_joe_> uploading tideways-xhprof_5.0.0~beta3 to reprepro T176916

Joe added a comment.Feb 19 2019, 3:42 PM

I did manually install php7.2-tideways-xhprof on mwdebug1001 and I now see the following error:

 Fatal error: Uncaught Error: Call to undefined function register_postsend_function() in /srv/mediawiki/wmf-config/profiler.php:359 
Stack trace: #0 /srv/mediawiki/wmf-config/profiler.php(46): wmfSetupTideways(Array) 
#1 /srv/mediawiki/wmf-config/PhpAutoPrepend.php(45): wmfSetupProfiler(Array) #2 {main} thrown in /srv/mediawiki/wmf-config/profiler.php on line 359

which isn't surprising given register_postsend_function is HHVM-specific AIUI.

Change 491518 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/mediawiki-config@master] profile: use register_shutdown_function

https://gerrit.wikimedia.org/r/491518

Change 491533 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::php: install tideways-xhprof, remove tideways

https://gerrit.wikimedia.org/r/491533

Krinkle changed the task status from Stalled to Open.EditedFeb 19 2019, 9:53 PM
Krinkle moved this task from Blocked to Doing on the Performance-Team board.

Next steps are for me:

  • Look at Excimer sampling config in wmf-config and verify it works.
  • Ensure code exists in wmf-config to submit them to Redis under a new topic (e.g. excimer, instead of xenon).
  • Set up separate arc-lamp subscriber and figure out how/where to store the flame graphs on webperf1002 disk.

Change 491533 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::php: install tideways-xhprof, remove tideways

https://gerrit.wikimedia.org/r/491533

Change 491518 merged by jenkins-bot:
[operations/mediawiki-config@master] profile: use register_shutdown_function

https://gerrit.wikimedia.org/r/491518

Joe added a comment.Feb 20 2019, 6:57 AM

On-demand profiling now works on mwdebug1002.

Change 492256 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Add debian-glue to operations/debs/tideways-xhprof

https://gerrit.wikimedia.org/r/492256

Change 492256 merged by jenkins-bot:
[integration/config@master] Add debian-glue to operations/debs/tideways-xhprof

https://gerrit.wikimedia.org/r/492256

Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)EditedFeb 27 2019, 3:14 PM

Confirmed that production PHP 7.2 is sending samples to topic excimer on mwlog1001 (Thanks to @Tim, was added as part of 3545a906d6b), and confirmed with redis-cli that I can receive these on webperf1001.

Samples:

redis mwlog1001 excimer
1) "message"
2) "excimer"
3) "/srv/mediawiki/w/api.php;/srv/mediawiki/php-1.33.0-wmf.18/api.php;/srv/mediawiki/php-1.33.0-wmf.18/includes/WebStart.php;/srv/mediawiki/php-1.33.0-wmf.18/includes/Setup.php;MediaWiki\\Session\\SessionManager::getGlobalSession;WebRequest::getSession;MediaWiki\\Session\\SessionManager::getSessionForRequest;MediaWiki\\Session\\SessionManager::getSessionInfoForRequest;CentralAuthSessionProvider::provideSessionInfo;User::isUsableName;Message::plain;Message::toString;Message::fetchMessage;MessageCache::get;MessageCache::getMessageFromFallbackChain;MessageCache::getMessageForLang;MessageCache::getMsgFromNamespace;MessageCache::load;Language::getMessageKeysFor;LocalisationCache::getSubitemList;LocalisationCache::getSubitem;LocalisationCache::loadSubitem;LocalisationCache::loadItem;LCStoreCDB::get 1"

1) "message"
2) "excimer"
3) "/srv/mediawiki/w/api.php;/srv/mediawiki/php-1.33.0-wmf.18/api.php;ApiMain::execute;ApiMain::executeActionWithErrorHandling;ApiMain::executeAction;ApiStashEdit::execute;ApiStashEdit::parseAndStash;WikiPage::prepareContentForEdit;MediaWiki\\Storage\\DerivedPageDataUpdater::getPreparedEdit;MediaWiki\\Storage\\DerivedPageDataUpdater::getCanonicalParserOutput;MediaWiki\\Revision\\RenderedRevision::getRevisionParserOutput;MediaWiki\\Revision\\RevisionRenderer::{closure:/srv/mediawiki/php-1.33.0-wmf.18/includes/Revision/RevisionRenderer.php(141)};MediaWiki\\Revision\\RevisionRenderer::combineSlotOutput;MediaWiki\\Revision\\RenderedRevision::getSlotParserOutput;MediaWiki\\Revision\\RenderedRevision::getSlotParserOutputUncached;AbstractContent::getParserOutput;WikitextContent::fillParserOutput;Parser::parse;Parser::internalParse;Parser::replaceVariables;PPFrame_DOM::expand;Parser::braceSubstitution;PPFrame_DOM::expand;Parser::braceSubstitution;Parser::callParserFunction;ScribuntoHooks::invokeHook;Scribunto_LuaModule::invoke;Scribunto_LuaEngine::executeFunctionChunk;Scribunto_LuaSandboxInterpreter::callFunction;Scribunto_LuaSandboxCallback::__call;Wikibase\\Client\\DataAccess\\Scribunto\\Scribunto_LuaWikibaseLibrary::getEntity;Wikibase\\Client\\DataAccess\\Scribunto\\EntityAccessor::getEntity;Wikibase\\Client\\Serializer\\ClientEntitySerializer::serialize;Wikibase\\Client\\Serializer\\ClientSerializer::injectSerializationWithDataTypes;Wikibase\\Lib\\Serialization\\SerializationModifier::modifyUsingCallback;Wikibase\\Lib\\Serialization\\CallbackFactory::{closure:/srv/mediawiki/php-1.33.0-wmf.18/extensions/Wikibase/lib/includes/Serialization/CallbackFactory.php(68)};Wikibase\\Lib\\PropertyInfoDataTypeLookup::getDataTypeIdForProperty;Wikibase\\Lib\\Store\\CachingPropertyInfoLookup::getPropertyInfo;Wikibase\\Lib\\Store\\CachingPropertyInfoLookup::getAllPropertyInfo;BagOStuff::get;MemcachedBagOStuff::doGet;MemcachedPeclBagOStuff::getWithToken 1"
name redis mwlog xenon (from HHVM, for comparison)
1) "message"
2) "xenon"
3) "load.php;{GET};/srv/mediawiki/php-1.33.0-wmf.18/load.php;/srv/mediawiki/php-1.33.0-wmf.18/includes/WebStart.php;/srv/mediawiki/php-1.33.0-wmf.18/includes/Setup.php;/srv/mediawiki/php-1.33.0-wmf.18/LocalSettings.php;/srv/mediawiki/wmf-config/CommonSettings.php 1"

1) "message"
2) "xenon"
3) "RunSingleJob.php;{POST};JobExecutor::execute;RefreshLinksJob::run;RefreshLinksJob::runForTitle;Wikimedia\\Rdbms\\LBFactory::commitAndWaitForReplication;Wikimedia\\Rdbms\\LBFactory::commitMasterChanges;Wikimedia\\Rdbms\\LBFactory::forEachLBCallMethod;Wikimedia\\Rdbms\\LBFactoryMulti::forEachLB;Closure$Wikimedia\\Rdbms\\LBFactory::forEachLBCallMethod;Wikimedia\\Rdbms\\LoadBalancer::commitMasterChanges;Wikimedia\\Rdbms\\LoadBalancer::forEachOpenMasterConnection;Closure$Wikimedia\\Rdbms\\LoadBalancer::commitMasterChanges;Wikimedia\\Rdbms\\Database::commit;Wikimedia\\Rdbms\\Database::doCommit;Wikimedia\\Rdbms\\Database::query;Wikimedia\\Rdbms\\Database::doProfiledQuery;Wikimedia\\Rdbms\\DatabaseMysqli::doQuery;mysqli::query 1"

One thing I notice is that the virtual frame for GET/POST seems to be missing. Not sure if that's intentional or not. Ignoring for now, as that should be easy to add later on.

Next up: Set up a second instance of arc-lamp that consumes these and writes them to disk in some way (naming format TBD).

Change 493247 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Rename /xenon to /arclamp on performance.wikimedia.org

https://gerrit.wikimedia.org/r/493247

Change 493251 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/docroot@master] Use /arclamp instead of /xenon (now a redirect)

https://gerrit.wikimedia.org/r/493251

Change 493247 merged by Dzahn:
[operations/puppet@production] webperf: Rename /xenon to /arclamp on performance.wikimedia.org

https://gerrit.wikimedia.org/r/493247

Change 493251 merged by jenkins-bot:
[performance/docroot@master] Use /arclamp instead of /xenon (now a redirect)

https://gerrit.wikimedia.org/r/493251

Krinkle raised the priority of this task from Normal to High.Mar 13 2019, 1:31 PM
Krinkle removed a project: Patch-For-Review.
Joe added a comment.Mar 18 2019, 4:22 PM

@Krinkle now that I'm back, what needs to be done on the SRE side for this to be considered done?

Change 498932 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] arclamp: Document YAML config file structure and logging logic

https://gerrit.wikimedia.org/r/498932

Change 498933 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] arclamp: Rename YAML file from xenon-log to arclamp-log-xenon

https://gerrit.wikimedia.org/r/498933

Change 498934 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] arclamp: Make Redis subscription channel configurable

https://gerrit.wikimedia.org/r/498934

Change 498942 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] arclamp: Rename xenon-log script to arclamp-log

https://gerrit.wikimedia.org/r/498942

Change 498948 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] arclamp: Rename xenon-grep to arclamp-grep

https://gerrit.wikimedia.org/r/498948

Change 498950 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] arclamp: Rename xenon-log to arclamp-log

https://gerrit.wikimedia.org/r/498950

Change 499011 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] profiler: Remove code for virtually prepended frames in excimer samples

https://gerrit.wikimedia.org/r/499011

Change 498932 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: Document YAML config file structure and logging logic

https://gerrit.wikimedia.org/r/498932

Change 498933 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: Rename YAML file from xenon-log to arclamp-log-xenon

https://gerrit.wikimedia.org/r/498933

Change 498934 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: Make Redis subscription channel configurable

https://gerrit.wikimedia.org/r/498934

Change 498942 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: Rename xenon-log script to arclamp-log in Puppet

https://gerrit.wikimedia.org/r/498942

Change 498948 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: Rename xenon-grep to arclamp-grep

https://gerrit.wikimedia.org/r/498948

Change 498950 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: Rename provisioning of xenon-log to arclamp-log

https://gerrit.wikimedia.org/r/498950

Change 499219 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] arclamp: make arclamp-grep work with excimer logs as well

https://gerrit.wikimedia.org/r/499219

Change 499220 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] arclamp: abstract arclamp::instance out of arclamp

https://gerrit.wikimedia.org/r/499220

Change 499222 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] arclamp: add a second instance for excimer logs

https://gerrit.wikimedia.org/r/499222

Change 499219 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: make arclamp-grep work with excimer logs as well

https://gerrit.wikimedia.org/r/499219

Change 499220 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: abstract arclamp::instance out of arclamp

https://gerrit.wikimedia.org/r/499220

Change 499011 merged by jenkins-bot:
[operations/mediawiki-config@master] profiler: Fix stack frame mangling for Excimer

https://gerrit.wikimedia.org/r/499011

Mentioned in SAL (#wikimedia-operations) [2019-03-27T22:45:06Z] <krinkle@deploy1001> Synchronized wmf-config/profiler.php: I8c7f8c58313d227a6d9959b9f3a1c / T176916 (duration: 00m 59s)

Krinkle added a comment.EditedWed, Mar 27, 10:51 PM

Looking at a few more samples with redis-cli, I noticed that some of these stacks appear to be missing several parents. Consuming them with Arc Lamp would thus lead to broken trees.

1) "message"
2) "excimer"
3) "/srv/mediawiki/w/load.php;ResourceLoader::getCombinedVersion;{closure:/srv/mediawiki/php-1.33.0-wmf.22/includes/resourceloader/ResourceLoader.php(660)};ResourceLoaderModule::getVersionHash;ResourceLoaderModule::getModuleContent;ResourceLoaderModule::buildContent;ResourceLoaderStartUpModule::getScript;ResourceLoaderStartUpModule::getModuleRegistrations;ResourceLoaderModule::getVersionHash;ResourceLoaderModule::getModuleContent;ResourceLoaderModule::buildContent;ResourceLoaderFileModule::getStyles;ResourceLoaderFileModule::readStyleFiles;ResourceLoaderFileModule::readStyleFile;ResourceLoaderFileModule::compileLessFile;Less_Parser::getCss;Less_Tree_Ruleset::compile;Less_Tree_Ruleset::PrepareRuleset;Less_Tree_Ruleset::evalImports;Less_Tree_Import::compile;Less_Tree_Import::ParseImport;Less_Tree_Ruleset::evalImports;Less_Tree_Import::compile;Less_Tree_Import::ParseImport;Less_Parser::parseFile;Less_Parser::GetRules;Less_Parser::parsePrimary;Less_Parser::MatchFuncs;Less_Parser::parseMixinDefinition;Less_Parser::parseMixinArgs;Less_Parser::MatchFuncs 1"

This stack appear to begin at ResourceLoader::getCombinedVersion which isn't possible afaik. The prefix /srv/mediawiki/w/load.php is prepended by the above fix.

The a stack could begin like that is if it was the entrypoint for a destructor or shutdown callback, which isn't the case afaik.

Perhaps the stacks are being trimmed? If that is the case and we cannot avoid that, then perhaps we need to change it to trim from the end rather than the start, as otherwise the frames cannot be used to create a flame graph.

Krinkle reassigned this task from Krinkle to tstarling.Wed, Mar 27, 10:51 PM

Oops, I forgot about this. There is indeed a maximum depth set.

$prodProf->setMaxDepth( 30 );
$prodProf->setFlushCallback(
	function ( $log ) use ( $options ) {
		wmfExcimerFlushCallback( $log, $options );
	}, 
HHVM Xenon stack depth
$ cat 2019-03-26.all.log |  awk -F';' '{print NF-1}' | sort -n > ~/2019_03_26_all.depth.csv
$ cat 2019_03_26_all.depth.csv | awk '{ print int(($1 / 10)) * 10 }' > 2019_03_26_all.depth.round10.csv
$ cat 2019_03_26_all.depth.round10.csv | uniq -c | sed -E -e 's/^ *([0-9]*) */\1'$'\t''/' 
59898	0
135234	10
117379	20
56997	30
31553	40
19231	50
9060	60
4306	70
2318	80
1302	90
980	100
198	110
104	120
32	130
4	140
3	150
1	160
2	170
2	180
3	190
1	200
3	210
1	220

@tstarling Two questions:

  1. Assuming I understood correctly that we trim from the start currently, is this something we can/should change?
  2. How big a stack depth would you be comfortable with?

Change 499222 merged by Giuseppe Lavagetto:
[operations/puppet@production] arclamp: add a second instance for excimer logs

https://gerrit.wikimedia.org/r/499222

Joe updated the task description. (Show Details)Thu, Mar 28, 12:05 PM
Joe added a comment.EditedThu, Mar 28, 12:10 PM

Arclamp is now correctly collecting the stacks sent by excimer. Alas, checking the generated SVGs, it looks like the bad format crept into the genrated flamegraphs, see https://performance.wikimedia.org/arclamp/svgs/hourly/2019-03-28_11.excimer.all.svgz for example.

I think we need to remove the base php version from the stack we send to arclamp-log, as we now have lines like:

/srv/mediawiki/w/load.php;/srv/mediawiki/php-1.33.0-wmf.22/load.php;/srv/mediawiki/php-1.33.0-wmf.22/includes/WebStart.php;/srv/mediawiki/php-1.33.0-wmf.22/includes/Setup.php;/srv/mediawiki/php-1.33.0-wmf.22/LocalSettings.php;/srv/mediawiki/wmf-config/CommonSettings.php;/srv/mediawiki/private/PrivateSettings.php;/srv/mediawiki/private/UALimit.php;AutoLoader::autoload;/srv/mediawiki/php-1.33.0-wmf.22/includes/auth/AbstractPreAuthenticationProvider.php;AutoLoader::autoload 1

which is clearly not the format we want given what was said above, and frankly look confusing to me.

Joe added a comment.Thu, Mar 28, 2:24 PM

Correcting myself:

1 - That format is correct, and indeed the only difference is the full name of the first element (and the absence of the {method} section, which tricked me
2 - The graph linked above seems too different from the one for xenon, but that's due to the low sample size and to the fact almost all the traffic for php7 is coming from web browsing of logged-in beta-testers

So upon further inspection, It seems the SVGs are generated correctly and the differences in graph values can be largely explained that way.

Thanks @Joe. The formatting, publishing, logging and SVG generation pipeline is all working fine it seems. Very nice :)

https://performance.wikimedia.org/arclamp/svgs/daily/2019-03-28.excimer.index.svgz

Looking at this though, it confirms the problem I spotted in T176916#5064425. About 10% of the data is cut off, causing the right-hand side of the graph to be corrupted. These frames are starting at functions like RenderedRevision::getSlotParserOutput and Parser::braceSubstitution without any parent frame. This is due to the depth limit and the trimming of the stack trace to only the top portion, rather than the bottom portion. These frames belong somewhere in the left 80% of the graph, but we don't know where currently.

  1. Assuming I understood correctly that we trim from the start currently, is this something we can/should change?
  2. How big a stack depth would you be comfortable with?

The linked list of stack frames only goes in one direction, from the currently executing function back towards the entry point, so finding the entry point takes O(N) time in the number of stack frames. In the worst case, excimer could end up using more time than the program it is profiling just to find the start frame. That's why it removes the start of the trace when it truncates. It just stops traversing. But the constant factor on this is fairly small. We could have two different depth limits: one to limit linked list traversal and another to limit recording. Also, for truncated traces, perhaps we could add some fake functions at the start and/or end to make it clear what is happening. The depth limit for traversal could be something like 10000.

The current depth limit was just set with set with a desire to avoid having mile-high flame graphs. It could be set to 1000 without breaking anything on Excimer's side.

Krinkle added a comment.EditedThu, Apr 11, 6:39 PM

@tstarling Interesting. I hadn't considered traversal cost. I thought it was mainly about memory, but it works differently than I thought.

  • The flushing to Redis happens mid-process, not post-send. This is different from HHVM and makes sense. On HHVM, the stacks could be extracted and stored in a separate process and flushed from any thread next to end its request. (I don't know if it actually does, though.) On PHP 7 we can't do that, so we're faced with a choice of either risking an indefinite memory build-up during long jobs (which are web requests for us), or discarding older stacks, or sending early and often. The latter makes the most sense and is what we did. It also means we get stacks prior to fatals and timeouts, which is nice.
  • The flushing happens after every sample (setFlushCallback sets max_samples=1). This sounded scary, but we only sample every 60 seconds within a single request, which generally means less than 1 sample and 1 flush per whole request (per grafana backend-timing). Cool.
  • We only traverse the stack during the interrupt for a sample. Not generally throughout a process.
  • Traversal has to start from the current frame and work its way down toward the entry point. Trimming from the bottom as we do now means we simply stop traversal at that point. Trimming from the top would mean we'd actually have to get the whole stack first, and then crop it.

Assuming the above are correct, a few questions:

Is it feasible to always traverse the whole stack?

I know this sounds like a naive ask, and from your description I'm guessing that would be too expensive (which totally makes sense). But it's unclear to me how this compares to related to PHP functions debug_backtrace() and ( new RuntimeException() )->getTraceAsString(). We we use regularly in production code, and appear to do this.

Do those have (significantly) cheaper access to the stack, or are they equally problematic? Perhaps they, too, have an upper limit. If that limit is higher than, say, 250, could we use that as basis for Excimer? I'd be fine with entirely discarding stacks larger than that given they're not meant to exist in prod. Perhaps with a statsd-increment for tracking when that happens with a Grafana alert, just in case :)

Could we comfortably traverse 250 stacks within 10 ms? I would assume yes, at the C level. But, I don't have experience to be sure. If not, what kind of stack size could we comfortably traverse in up to ~10 ms? (10ms being a small random amount of time that seems acceptable to delay sampled request by)

Change 503083 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] profiler: Increase max stack depth for sampling profiler to 250

https://gerrit.wikimedia.org/r/503083

Change 503083 merged by jenkins-bot:
[operations/mediawiki-config@master] profiler: Increase max stack depth for sampling profiler to 250

https://gerrit.wikimedia.org/r/503083

tstarling added a comment.EditedFri, Apr 12, 1:56 AM

Is it feasible to always traverse the whole stack?

I know this sounds like a naive ask, and from your description I'm guessing that would be too expensive (which totally makes sense). But it's unclear to me how this compares to related to PHP functions debug_backtrace() and ( new RuntimeException() )->getTraceAsString(). We we use regularly in production code, and appear to do this.

The cost is indeed similar to debug_backtrace() with $limit=0. But if something goes into infinite recursion, I'd rather Excimer not be the first thing to break. In PHP 7, the recursion depth is not limited by the stack size anymore, only by the memory limit. We have the memory limit set to 500M in production, and locally I can reach a stack depth of 2.6 million with that memory limit. Throwing an exception or calling debug_backtrace() at such a stack depth would not work, on PHP 7.2 it gives an OOM above about 500,000 depth. I got a segault when attempting this on PHP 7.1. But at least exceptions are thrown at known points in the code for knowable reasons.

Could we comfortably traverse 250 stacks within 10 ms? I would assume yes, at the C level. But, I don't have experience to be sure. If not, what kind of stack size could we comfortably traverse in up to ~10 ms? (10ms being a small random amount of time that seems acceptable to delay sampled request by)

I benchmarked it just now at 5000 stack frames per millisecond, so in 10ms you could do 50,000 stack frames. This is the time to add the log entry, calling formatCollapsed() did not add much time to this.

Excimer segfaults for me at a depth of 52369, due to exhaustion of the C stack. PHP is using an internal stack, but Excimer is traversing it using recursion.

Change 503652 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] profiler: Discard incomplete stacks and monitor via statsd

https://gerrit.wikimedia.org/r/503652

Change 503652 merged by jenkins-bot:
[operations/mediawiki-config@master] profiler: Discard incomplete stacks and monitor via statsd

https://gerrit.wikimedia.org/r/503652

Krinkle closed this task as Resolved.Tue, Apr 16, 12:33 AM
Krinkle removed a project: Patch-For-Review.