Page MenuHomePhabricator

Reduce performance impact of HookRunner.php loading 500+ interfaces
Closed, ResolvedPublic

Description

With buster, file loading/stat-ing syscalls are slower, so AutoLoader is where a lot of regression is showing up (see detailed explanation in T273312).

We should look into the impact caused by HookRunner loading 500+ interfaces and if there are any straightforward ways to migitate/reduce that (e.g. putting all the interfaces in one/two files)

Notes from IRC:

21:51:59 <legoktm> I started looking into autoloader perf, and it seems like most of the classes we're loading in a simple load.php request are just the hook interfaces: https://phabricator.wikimedia.org/P14211 (74%)
21:55:29 <legoktm> this was on mostly vanilla MW, but still 500+ interfaces does seem like a lot
22:40:31 <ori> https://github.com/wikimedia/mediawiki/blob/master/includes/HookContainer/HookRunner.php#L29-L571 huh
22:50:33 <ori> core hooks are split between an ApiHookRunner for API hooks and a HookRunner for everything else
22:52:23 <ori> is there anything to gain by breaking up HookRunner into new groupings?
22:58:40 <legoktm> ori: maybe, I was thinking of a "SetupHookRunner" for early stuff like the hook in MediaWikiServices, SetupAfterCache, WebRequest::getIP, but stopped looking after seeing it go into LanguageFactory/Language
00:00:00 - {Day changed to Friday, February 5, 2021}
03:17:49 <TimStarling> I think part of the rationale for the whole project was that we don't care about non-opcache installs
03:18:07 <TimStarling> if it's slow with opcache enabled then that is a concern
03:19:11 <TimStarling> there's things we could do to optimise HookRunner but it would be fairly ugly

Event Timeline

Something I thought of is the following:

  1. Introduce a new abstract class HookRunnerBase which implements all of the Hook interfaces. Because it's abstract, do not provide any implementations in that class (leave it empty)
  2. Have HookRunner extend HookRunnerBase (and not directly implement any interfaces)

This extra layer of indirection changes nothing by default; HookRunner still needs to properly implement the interfaces to avoid errors, we get doc comments in IDEs that support them, etc.

HOWEVER a production server can swap out the definition of HookRunnerBase with an empty abstract class that doesn't implement any interfaces (or implements only the subset of hook interfaces that are in widespread use, or whatever). Consider the following snippet of code that could be placed into LocalSettings.php:

abstract class HookRunnerBase { }

Because the class is now defined in LocalSettings.php, the autoloader will not attempt to fetch the regular HookRunnerBase (that has all the interface implementations). The only thing that breaks would be attempting to cast HookRunner into one of the hook interfaces. As far as I can tell, we don't do that so it should be safe.

Of course, asking people to define arbitrary classes in LocalSettings is probably not a good idea, so we can instead introduce some new configuration variable for it, such as $wgDebugHookRunnerInterfaces. This would default to true in DefaultSettings.php. Setting it to false in LocalSettings.php could trigger the following code added to Setup.php (immediately after all extensions are loaded):

if ( !$wgDebugHookRunnerInterfaces ) {
    abstract class HookRunnerBase { }
}

Achieving the same goal, but via a configuration variable. I'm working on a test patch for this so we can benchmark before/after to see if loading these interfaces has any noticeable perf impact.

The above doesn't actually work; while we aren't casting to Hook interfaces anywhere, a handful of constructors take the hook interface as strongly-typed arguments (e.g. WikiPageFactory). This causes fatals when HookRunner doesn't implement the hook interface classes, and I do not believe giving up type safety in that regard is worthwhile.

The above doesn't actually work; while we aren't casting to Hook interfaces anywhere, a handful of constructors take the hook interface as strongly-typed arguments (e.g. WikiPageFactory). This causes fatals when HookRunner doesn't implement the hook interface classes, and I do not believe giving up type safety in that regard is worthwhile.

Based on https://codesearch.wmcloud.org/core/?q=use%20MediaWiki(.*%3F)%3FHook%3B&i=nope&files=&excludeFiles=&repos= it might be that WikiPageFactory is the only place doing this? (PageEditStash actually type hints against HookContainer despite the doc comments indicating otherwise).

I'm not sure what extra safety you get by typing against the Hook interface rather than the Container/Runner besides you could theoretically pass in a standalone implementation of the Hook rather than needing a proper Container. In any case, given that it's only used in one place in core, I think it should be straightforward to comment out, at least for the purpose of this investigation.

Change 662777 had a related patch set uploaded (by Skizzerz; owner: Skizzerz):
[mediawiki/core@master] [DNM] HookRunner perf improvements

https://gerrit.wikimedia.org/r/662777

Change 663069 had a related patch set uploaded (by Legoktm; owner: Skizzerz):
[mediawiki/core@wmf/1.36.0-wmf.30] [DNM] HookRunner perf improvements

https://gerrit.wikimedia.org/r/663069

I cherry-picked the patch and manually deployed it to mwdebug1003. The request I tested was https://kk.wikipedia.org/w/load.php?debug=false&lang=kk&modules=ext.3d.styles%7Cext.cite.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.page.gallery.styles%7Cmediawiki.skinning.interface%7Cmediawiki.toc.styles%7Cskins.vector.styles%7Cwikibase.client.init&only=styles&skin=vector (I moved kkwiki to wmf.30 to avoid having to backport the patch to wmf.27)

My first runs (I accidentally cut off part of the URL in these):

This brings the Autoload::autoload() for load.php down from 13067ms to 4323ms... overall request went from 56,997ms to 39,903ms so some other noise probably but that seems promising.

I re-ran it with the full URL:

functionwith hooksno interfaces
Autoloader::autoload274 calls, 6,144ms274 calls, 4,345ms
Autoloader::autoload@1627 calls, 6,239ms102 calls, 1,148ms
Autoloader::autoload@29 calls, 120ms9 calls, 111ms
Autoloader::autoload@34 calls, 40ms4 calls, 30ms

Next I'm going to depool a buster appserver, and run our more sophisticated/thorough benchmark script against wmf.30 with the hooks, and then again with no interfaces to get better and more accurate numbers.

Mentioned in SAL (#wikimedia-operations) [2021-02-10T08:41:37Z] <legoktm> depooling mw1404.eqiad.wmnet for perf benchmarking (T274041)

It's not really essential for HookRunner to implement any interfaces. I think the simplest solution is to remove all the interfaces from HookRunner. Duplicate the doc comments, and have a test that ensures that the HookRunner method signatures match the interface signatures. @Pchelolo suggests generating HookRunner.php the way we generate autoload.php, which I guess would be nice to have. The interfaces will still exist for the convenience of extension handlers, but they will only need to be loaded when the handlers are actually called.

Part of the rationale for HookRunner implementing interfaces was that PHPStorm was going to show links to the interface declarations. But it usually doesn't -- PHPStorm limits the number of interfaces it will load for a given class, and we're well over that limit. So duplicating the doc comments is the best way to show documentation for developers writing hook callers in PHPStorm.

Next I'm going to depool a buster appserver, and run our more sophisticated/thorough benchmark script against wmf.30 with the hooks, and then again with no interfaces to get better and more accurate numbers.

Data and images available from https://people.wikimedia.org/~legoktm/T274041/ - IMO they're not really worth digging into more because unsurprisingly it's faster when you don't load 500+ more files.

For the most part this showed that this disproportionately affects load.php, probably because ResourceLoader usually doesn't load as many PHP classes as index.php and api.php do. And I suppose those hook interfaces will get loaded eventually when the first hook from an extension is called, but load.php doesn't call anywhere as many hooks.

... The interfaces will still exist for the convenience of extension handlers, but they will only need to be loaded when the handlers are actually called.

I worry that some mega-extensions like CentralAuth or Wikibase could end up loading a lot of interfaces whenever the first hook needs to be called. A convention of grouping related handlers (all API, all OutputPage, etc.) might mitigate that.

Change 663069 abandoned by Legoktm:
[mediawiki/core@wmf/1.36.0-wmf.30] [DNM] HookRunner perf improvements

Reason:

https://gerrit.wikimedia.org/r/663069

Ideas from technical planning meeting:

  • Make a separate hook runner for use within load.php?
  • Preload? T240775 Should be easy after k8s deployment.
  • Optimise PSR-4 code
  • Concatenate files
  • Have abstract classes representing hook modules, multiple hooks per class

I'd be in remiss if I don't beat my usual drum: php-opcache.

As I understand it, after T253673 it is the plan by ServiceOps (/cc @Joe) to disable the runtime mtime revalidation feature in opcache as soon as possible. That change wasn't previously motivated by performance gains (we very briefly touched on the subject from that angle with SRE, but afaik we just figured opcache's mtime checks hit Linux's disk caches mostly and would not amount to much). However, with the latest Debian upgrade and the increased cost for syscalls, I'm guessing that at least now it does amount to something.

Would you expect that to also solve this adequately, or do we know of other other significant costs with loading these classes as separate files – if the mtime revalidations are turned off?

Krinkle renamed this task from Investigate performance impact of HookContainer loading 500+ interfaces to Investigate performance impact of HookRunner.php loading 500+ interfaces.May 10 2021, 5:55 PM
Krinkle renamed this task from Investigate performance impact of HookRunner.php loading 500+ interfaces to Reduce performance impact of HookRunner.php loading 500+ interfaces.Jan 27 2022, 6:25 PM

Ideas from technical planning meeting:

  • Make a separate hook runner for use within load.php?

I've been pondering over this in the back of my mind over the past week, and I keep going in cirlces. In particular, as I'm trying to not break static analysis and IDE benefits. My circle goes like this:

  • Perhaps we could have two HookRunner's by the same name, and decide which one we use at runtime in load.php (and possibly configurable more generally). We would need to make Phan not see the "fast" variant. This feels messy, also complicates the autoloader, and would likely be incompatible with preloading in the future.
  • I'm drawn momentarily to giving them different names and extending one from the other (one with the methods, the other adding the interace implements; thus no code duplication or code generation needed). This raises the issue of type hints. It seems we'd inevitably end up typing against the weaker one (the parent class), and thus again break Phan analysis and other developer tooling.
  • I then just give it ago without worrying about the tooling and just get the fast one working. This exposed two problems. Firstly, a small issue, someone decided to type against an individual hook interface in WikiPageFactory. Afaik nothing else does this in core, and the pattern isn't documneted and doesn't seem scalable. I think we can drop this without losing much value (it wasn't discussed in the Hook container RFC afaik). Secondly, HookRunner is documented to encourage, and widely used as, new HookRunner(..). With instantiations local and distributed all over (as per T254271). In my proof-of-concept patch I'm changing these to HookRunnner::create where a constant dictates the implementation (akin to MW_NO_SESSION). I'm not in love with this though.
  • This then runs into various native type hints for HookRunner. Naming either the weaker or the stronger one as this doesn't solve the problem. I've removed the type hints in order to unblock the proof-of-concept.

One solution might be to use a union type. We can type it as HookRunner|FastHookRunner or some such and that should result in Phan effectively enforcing the stricter one of the two. It doesn't matter if for load.php we only ever one and not the other, Phan will adhere to what is typed, and likely won't be able to figure out the wiring code for this anyway, so doesn't need to "believe" both possibilities. Note that we'd have to use @param, as native unions start in PHP 8.0.

Before (2 warmups; browser cache disabled)
# /w/load.php?debug=2&lang=en&modules=ext.eventLogging&forceprofile=1
100.00% 1592.570      1 - main()
 90.70% 1444.454    138 - AutoLoader::autoload # <!---
..
 84.40% 1344.056      2 - Hooks::runner
 83.50% 1329.840    579 - AutoLoader::autoload@1 # <!---
 18.01%  286.814    729 - AutoLoader::find       # <!---
..
  7.62% 121.286      1 - wfLoadMain
  5.18% 82.535      1 - MediaWiki\MediaWikiServices::getResourceLoader
After
100.00% 177.256      1 - main()
 61.59% 109.176      1 - wfLoadMain
..
 24.78% 43.930    138 - AutoLoader::autoload  # <!---

Change 777483 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] [WIP] HookRunner: Implement a fast HookRunner for load.php

https://gerrit.wikimedia.org/r/777483

84.40% 1344.056      2 - Hooks::runner

I don't believe these numbers. In load.php wall time according to Excimer, Autoloader::autoload() with HookRunner in the stack is only 3%, not 84%. The median service time in Grafana is around 50ms implying that HookRunner setup takes about 1.5ms per request. This looks like the call count multiplied by the profiler overhead rather than a measure of production impact.

The reason I'm questioning the numbers is because there needs to be a benchmark demonstrating that the patch is beneficial for performance. Adding HookRunner::create() may add a microsecond or two per call compared to new HookRunner, so a benchmark needs to show that that overhead is offset by reduced time spent in Autoloader::autoload(). The autoloader is O(1) in the size of the request whereas HookRunner::create() may be called once per row or once per link, so ideally we would have a measurement of Autoloader::autoload() under realistic conditions, and a tight loop benchmark of HookRunner::create() for comparison.

This comment was removed by Krinkle.

If phan is a concern, you could still get it to work for a HookRunner without interfaces. Just add @method annotations. The implementation could look something like this (possibly with better names):

/**
 * @method RetType onSomeEvent( T $x, S $y, ... )
  (then other 500 lines like the above)
 */
interface IHookRunner { // This is the interface to use in typehints etc.
}
class FastHookRunner implements IHookRunner {
 // Hook methods here
}
class SlowHookRunner extends FastHookRunner implements /* All the hook interfaces here */ {
}

Phan reads the @method annotations from the base interface, and so does PHPStorm.

Obviously, it would be incredibly stupid if we didn't do anything to enforce that the comments stay up to date. I guess we could have a structure test for that, it shouldn't be too hard to write one. The up-do-date info about the hook methods would be pulled from SlowHookRunner; the test would then tokenize the IHookRunner file, look for the comment immediately before the interface declaration, and match that against the list of known methods. It would also be possible to decide which implementation should be used, depending on whether speed is a concern.

I recently profiled a development instance of our MediaWiki deployment with PHP 8.0 using perf to test out PHP 8.0's newfound support for perf.map files / JIT dumping by issuing requests to load.php (without parameters). I observed that ~16% of request time was spent in AutoLoader::find, the majority of it in an access(2) syscall rather than AutoLoader::find itself[1]:

Screenshot 2022-10-31 at 23.05.35.png (1×3 px, 482 KB)

Looking at AutoLoader::find, the only potential source for an access(2) syscall seems to be the call to file_exists in the PSR-4 autoloading implementation. Since PHP's stat cache is apparently request scoped, this file_exists call will cause 1 syscall per every class autoloaded via the PSR-4 autoloader - and hook interfaces rely on the PSR-4 autoloader.

I then created a maintenance script to create a basic classmap autoloader for core PSR-4 roots & extensions' PSR-4 roots. It uses composer's classmap generator library just for simplicity's sake (it is arguably not necessary):

<?php
/**
 * Copyright 2022 Fandom, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
use Composer\ClassMapGenerator\ClassMapGenerator;

require_once __DIR__ . '/../Maintenance.php';

/**
 * Generate a basic classmap autoloader for MediaWiki core and installed extensions.
 */
class GenerateClassmapAutoloader extends Maintenance {
	public function execute() {
		global $IP, $wgAutoloadClasses;

		if ( !getenv( 'ENABLE_ALL_EXTENSIONS' ) ) {
			$this->fatalError(
				"ENABLE_ALL_EXTENSIONS=1 env var must be set or the output will be incomplete, aborting\n"
			);
		}

		$autoloader = <<<'PHP'
<?php
global $wgAutoloadClasses;
$wgAutoloadClasses += [

PHP;

		$lines = [];

		foreach ( AutoLoader::$psr4Namespaces as $prefix => $path ) {
			// Don't crash on potential stale PSR-4 base paths in extensions
			if ( is_readable( $path ) && is_dir( $path ) ) {
				foreach ( ClassMapGenerator::createMap( $path ) as $className => $classPath ) {
					if ( str_starts_with( $className, $prefix ) ) {
						if ( !isset( $lines[$className] ) && !isset( $wgAutoloadClasses[$className] ) ) {
							$lines[$className] = "\t'$className' => '$classPath'";
						}
					}
				}
			}
		}

		$autoloader .= implode( ",\n", $lines ) . "\n];\n";

		file_put_contents( "$IP/autoload-classmap.php", $autoloader );
	}
}

$maintClass = GenerateClassmapAutoloader::class;
require_once RUN_MAINTENANCE_IF_MAIN;

I included the generated classmap file via LocalSettings.php. This apparently reduced the share of AutoLoader::find in flamegraphs for the same load.php test workload from ~16% to ~2.3%.[2]
Since the test system was using Docker, file system access in this context was likely costlier than it would be in a real-life workload. I therefore attempted to replicate the results using two Kubernetes deployments of MediaWiki - one with the generated classmap and one without. I benchmarked load.php on these deployments for a duration of 120s per run with a varying request rate.

At 40 RPS, the difference in observed response times appears to line up with the ~14% difference in flamegraphs:

BaselineClassmap autoloader
Requests [total, rate, throughput] 4800, 40.01, 40.00
Duration [total, attack, wait] 2m0s, 2m0s, 28.764ms
Latencies [min, mean, 50, 90, 95, 99, max] 19.855ms, 26.237ms, 25.541ms, 29.205ms, 31.127ms, 36.026ms, 336.334ms
Bytes In [total, mean] 940800, 196.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:4800
Requests [total, rate, throughput] 4800, 40.01, 40.00
Duration [total, attack, wait] 2m0s, 2m0s, 20.72ms
Latencies [min, mean, 50, 90, 95, 99, max] 17.618ms, 22.736ms, 22.073ms, 25.75ms, 27.3ms, 31.178ms, 373.772ms
Bytes In [total, mean] 940800, 196.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:4800

At 220 RPS, the classmap autoloader deployment appears to have noticeably lower response time:

BaselineClassmap autoloader
Requests [total, rate, throughput] 26400, 220.01, 219.96
Duration [total, attack, wait] 2m0s, 2m0s, 24.722ms
Latencies [min, mean, 50, 90, 95, 99, max] 18.91ms, 36.948ms, 27.869ms, 57.363ms, 100.272ms, 170.963ms, 540.75ms
Bytes In [total, mean] 5174400, 196.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:26400
Requests [total, rate, throughput] 26400, 220.01, 219.97
Duration [total, attack, wait] 2m0s, 2m0s, 19.557ms
Latencies [min, mean, 50, 90, 95, 99, max] 15.865ms, 20.625ms, 19.999ms, 23.57ms, 24.909ms, 27.886ms, 456.656ms
Bytes In [total, mean] 5174400, 196.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:26400

Based on this, it seems that it may be worth to further explore the use of a generated classmap for PSR-4 roots in MediaWiki. One question that came to my mind was whether extensions should be covered by it - it's doable (as seen in the attached example script), but requires initializing the MediaWiki stack and loading configuration to discover installed extensions, and may cause issues with extensions that use class_exist as a facility to determine whether another extension is enabled or not. In contrast, generating classmaps for PSR-4 roots in core could be done with relatively low effort, possibly by augmenting the existing generateLocalAutoload.php / AutoloadGenerator facilities.

What do you think?


Based on this, it seems that it may be worth to further explore the use of a generated classmap for PSR-4 roots in MediaWiki. One question that came to my mind was whether extensions should be covered by it - it's doable (as seen in the attached example script), but requires initializing the MediaWiki stack and loading configuration to discover installed extensions, and may cause issues with extensions that use class_exist as a facility to determine whether another extension is enabled or not. In contrast, generating classmaps for PSR-4 roots in core could be done with relatively low effort, possibly by augmenting the existing generateLocalAutoload.php / AutoloadGenerator facilities.

What do you think?

I think losing the use of class_exists for extension set-up seems fine; people are meant to use ExtensionRegistry::getInstance()->isLoaded() for a few years now, though there are a handful of things still using the old style in non-test code (e.g. CentralNotice looking for MobileFrontend and CentralAuth looking for AntiSpoof); migrating those from first blush looks very do-able.

Based on this, it seems that it may be worth to further explore the use of a generated classmap for PSR-4 roots in MediaWiki. One question that came to my mind was whether extensions should be covered by it - it's doable (as seen in the attached example script), but requires initializing the MediaWiki stack and loading configuration to discover installed extensions, and may cause issues with extensions that use class_exist as a facility to determine whether another extension is enabled or not. In contrast, generating classmaps for PSR-4 roots in core could be done with relatively low effort, possibly by augmenting the existing generateLocalAutoload.php / AutoloadGenerator facilities.

What do you think?

I think losing the use of class_exists for extension set-up seems fine; people are meant to use ExtensionRegistry::getInstance()->isLoaded() for a few years now, though there are a handful of things still using the old style in non-test code (e.g. CentralNotice looking for MobileFrontend and CentralAuth looking for AntiSpoof); migrating those from first blush looks very do-able.

Thanks James :) Yeah, that migration path is there. Maybe we could offer a separate script to dump extension PSR-4 roots, and integrate the functionality for core into generateLocalAutoload.php. Then consumers like Wikimedia get classmaps for core PSR-4 roots out of the box without having to integrate another build step.

This is all assuming of course that my measurements were correct in the first place :P

Looking at AutoLoader::find, the only potential source for an access(2) syscall seems to be the call to file_exists in the PSR-4 autoloading implementation. Since PHP's stat cache is apparently request scoped, this file_exists call will cause 1 syscall per every class autoloaded via the PSR-4 autoloader - and hook interfaces rely on the PSR-4 autoloader.

Macro much-syscall:

What do you think?

We can do two things:

  • code search for class_exists and fix everything and then simply switch
  • start with core classes

Either way is fine with me.

Change 853972 had a related patch set uploaded (by TK-999; author: TK-999):

[mediawiki/core@master] Include core PSR-4 classes in the generated classmap

https://gerrit.wikimedia.org/r/853972

It seems that including PSR-4 classes from core in the autoload.php classmap is quite doable. There is one test failure related to the Shellbox class alias defined in Result.php, it complains that the alias should be in the same file as the class being aliased. That would always have been a problem for this particular alias, the test just never checked it before because it was skipping every PSR-4 file.

I decided to simply // NO_AUTOLOAD that file for now, which effectively preserves the status quo.

[…]
One question that came to my mind was whether extensions should be covered by it - it's doable (as seen in the attached example script), but requires initializing the MediaWiki stack and loading configuration to discover installed extensions, and may cause issues with extensions that use class_exist as a facility to determine whether another extension is enabled or not. In contrast, generating classmaps for PSR-4 roots in core could be done with relatively low effort, possibly by augmenting the existing generateLocalAutoload.php / AutoloadGenerator facilities.

I think the class_exist caveat on its own could be a relatively high but acceptable cost in terms of long-term confusion as something we could fix and then enforce through a code sniff in favour of ExtensionRegistry->isLoaded.

However, I think it's worth avoiding for two other reasons:

  1. Confidence that extensions cannot come to life on wikis where they are not installed. E.g. I would not want to have to think about and reason what will happen in these cases especially with missing settings for it and what the extension defaults would allow people to access and/or what untested semi-failures would look like (eg. non-nullable config is implicitly null by being unset, and what "interesting" behaviour that might cause).
  2. Confidence that unserialize can't materialize classes not intentionally offered to the autoloader, especially through indirect means such as Memcached/APCU keys being read.
  3. Confidence that classes are consistently available in Setup.php after a certain point, e.g. not creating a source of "but it works works on [prod/local]".

In addition to these benefits of extension subsetting, I think the complexity could be brought down somewhat if we approach it statically. (Or perhaps this is what Máté had in mind already, but in different words?)

The current draft iterates AutoLoader::$psr4Namespaces. Even if we did go for the all-extensions-all-the-time approach, this would only work if we first load all extensions in the MW CLI context. Actually loading all extensions seems unsafe/scary (for which wiki context? what about hooks/callbacks?). Inventing a new semi-loading concept would add imho significant complexity to an initialisation phase that has already gotten quite complex after the Config and ServiceWiring refactors and restrictions of past 1-2 years.

Instead, approaching it statically, perhaps we could iterate extensions/*/extension.json. We already do this for the installer as used by WMF CI to auto-discover extensions. A slight variation would be to leverage a list like extensions-list. We could use that as-is for this purpose. (An explicit list is more strict and helps detecting problems early and produce deterministic output. As I understand it, with this mode enabled, we would disable dynamic look ups so it must be perfect or else we get confusing errors that we might not catch until very close to production traffic.). The iterator would do the same as Máté's current script, scanning the directories with Composer's scanner with no actual loading or execution. Except, instead of appending to a flat list, it could perhaps remember the "name" key of extension.json when writing out the class map. Then, on the other end when it is loaded, it could perhaps be read by ExtensionRegistry when enabling an extension as alternative. E.g. if a key exists in the classmap cache, append that array. A few array merges should be fine to take at load time, and becomes invisible presumably in the face of the I/O we're saving. To communicate the cache between LocalSettings/wmf-config and ExtensionRegistry I would perhaps go with a constant dedicated to this purpose. e.g. const MW_CLASSMAP_CACHE = [ 'core' => .., 'Cite' => .. ];

I might be missing something super obvious but the biggest bottleneck of performance here is because of core's classes and interfaces (most notably and the motivation behind this ticket: HookRunner). Why not simply getting this out of the door for core only and then check flamegraph to see if it's still a big issue (and then discuss on how to build class map for extensions if that's actually still an issue).

For sure, core first is fine. I didn't mean to suggest otherwise!

Change 853972 merged by jenkins-bot:

[mediawiki/core@master] Include core PSR-4 classes in the generated classmap

https://gerrit.wikimedia.org/r/853972

Change 854077 had a related patch set uploaded (by Ladsgroup; author: TK-999):

[mediawiki/core@wmf/1.40.0-wmf.8] Include core PSR-4 classes in the generated classmap

https://gerrit.wikimedia.org/r/854077

Change 854077 merged by jenkins-bot:

[mediawiki/core@wmf/1.40.0-wmf.8] Include core PSR-4 classes in the generated classmap

https://gerrit.wikimedia.org/r/854077

Mentioned in SAL (#wikimedia-operations) [2022-11-08T12:37:22Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:854077|Include core PSR-4 classes in the generated classmap (T274041)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-08T12:37:42Z] <ladsgroup@deploy1002> ladsgroup and ladsgroup: Backport for [[gerrit:854077|Include core PSR-4 classes in the generated classmap (T274041)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-11-08T12:42:51Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:854077|Include core PSR-4 classes in the generated classmap (T274041)]] (duration: 05m 29s)

We have a twenty percent increase (5% of total reqs) in requests being served under 50ms:
https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red?orgId=1&viewPanel=57&from=1667908004102&to=1667911604102

image.png (968×1 px, 127 KB)

And 10% increase (4-5% of total reqs) in requests being served under 100ms:

image.png (968×1 px, 129 KB)

The current draft iterates AutoLoader::$psr4Namespaces. Even if we did go for the all-extensions-all-the-time approach, this would only work if we first load all extensions in the MW CLI context. Actually loading all extensions seems unsafe/scary (for which wiki context? what about hooks/callbacks?). Inventing a new semi-loading concept would add imho significant complexity to an initialisation phase that has already gotten quite complex after the Config and ServiceWiring refactors and restrictions of past 1-2 years.

Yes, I have no strong feelings on this. I just went with whatever seemed the simplest for the sake of the example/demo implementation. We don't even need to use the composer classmap generator either but can reuse AutoLoadGenerator's if we want to avoid the hassle of adding a new dependency.

A few array merges should be fine to take at load time, and becomes invisible presumably in the face of the I/O we're saving.

Likely so. However, I remember that the cost of AutoloadClasses merges may have been cited in the context of T278278 as one reason to migrate extensions to namespaces instead. I could not find anything concrete on that though, and it's possible I'm conflating the cost of array_merge with the cost of storing a large AutoloadClasses mapping in APCu—the latter of which we would avoid here.

To communicate the cache between LocalSettings/wmf-config and ExtensionRegistry I would perhaps go with a constant dedicated to this purpose. e.g. const MW_CLASSMAP_CACHE = [ 'core' => .., 'Cite' => .. ];

Either a global variable or a global constant seem fine here to me.

Change 954630 had a related patch set uploaded (by Reedy; author: TK-999):

[mediawiki/core@REL1_39] Include core PSR-4 classes in the generated classmap

https://gerrit.wikimedia.org/r/954630

Change 954630 merged by jenkins-bot:

[mediawiki/core@REL1_39] Include core PSR-4 classes in the generated classmap

https://gerrit.wikimedia.org/r/954630

Change 777483 abandoned by Krinkle:

[mediawiki/core@master] [WIP] HookRunner: Implement a fast HookRunner for load.php

Reason:

No significant redunction left after the PSR-4 classmap change.

https://gerrit.wikimedia.org/r/777483