Page MenuHomePhabricator

Performance review of CommunityConfiguration
Open, MediumPublic

Description

Description

(Please provide the context of the performance review, and describe how the feature or service works at a high level technically and from a user point of view, or link to documentation describing that.)

Community configuration is a system that allows on-wiki admins to modify a certain subset of MediaWiki configuration. This is done through storing configuration in the form of JSON on-wiki pages in the MediaWiki namespace, and using them for configuration (subject to caching, see above). A more detailed explanation of the project is available at https://www.mediawiki.org/wiki/Community_configuration. A more technical introduction is available as a slidedeck for Hackathon 2023.

Note CommunityConfiguration is intended as a platform for other Wikimedia developers. It does not function on its own – for it to be actually usable, a client extension needs to exist as well. The Growth team is converting GrowthExperiments to CommunityConfiguration as of now; there is also an example available.

Preview environment

CommunityConfiguration is enabled at eswiki beta. Administrators can modify the configuration values via https://es.wikipedia.beta.wmflabs.org/wiki/Special:CommunityConfiguration; those values are then used in the following context:

  • in page load context (eg. Help panel); this is only used if all three checkboxes under "Newcomer editor features" are checked in Special:Preferences; newly-created accounts are automatically provisioned in this way,
  • at Special:Homepage
  • at Special:MentorDashboard
  • in maintenance jobs (eg extensions/GrowthExperiments/maintenance/refreshPraiseworthyMentees.php)

For testing purposes, configuration can be loaded within the command line:

urbanecm@deployment-deploy03:~$ mwscript shell.php eswiki
Psy Shell v0.12.0 (PHP 7.4.33 — cli) by Justin Hileman
# this call is guaranteed to access Community configuration. If the requested key is not available, it might error out, but it will not fall back to non-CommunityConfiguration sources
> \MediaWiki\MediaWikiServices::getInstance()->get('CommunityConfiguration.ProviderFactory')->newProvider('Mentorship')->get('GEMentorshipEnabled')
= false

# when an unrecognized variable is requested, this call fallbacks to GlobalVarConfig
> \MediaWiki\MediaWikiServices::getInstance()->get('CommunityConfiguration.WikiPageConfigReader')->get('GEMentorshipEnabled')
= false

>
Which code to review

The main code is available as mediawiki/extensions/CommunityConfiguration in Wikimedia Gerrit.

GrowthExperiments includes client code that integrates with CommunityConfiguration (see source), but I'd expect any performance concerns related to CommunityConfiguration to exist within the CommunityConfiguration repo.

Performance assessment
  • What work has been done to ensure the best possible performance of the feature?

We cache the on-wiki JSON pages in Memcached as well as in APCU, lowering the cost-to-load as much as possible.

  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?

The extension switches the source for certain configuration from PHP globals to on-wiki JSON pages. This can cause the site to generally load more slowly, as on-wiki pages are slower to read. Considering the on-wiki JSON pages storage backend, it can also increase the number of SQL queries performed in a page load context.

  • Are there potential optimisations that haven't been performed yet?

No.

  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the MediaWiki Platform Team for advice: mediawiki-platform-team@wikimedia.org.

We measure the speed of pageload of Special:Homepage (the user of the previous version of Community configuration, which was a part of the GrowthExperiments extension). The speed was not negatively affected by the introduction of Community configuration. That said, we do not have dedicated performance measurements in CommunityConfiguration extension itself (it is not super-clear what would we be measuring either).

Event Timeline

@KStoller-WMF I went ahead and submitted the performance review request for us, providing an answer to all questions that were a part of the form. I'm also posting an answer to the question that was not in the form, but you told me in Slack MediaWiki Engineering asked it:

What part of MW core is affected by this change?

None. CommunityConfiguration is a new extension, and this review is requested following the Writing an extension for deployment guidance, considering performance risks are expected given the extension alters how configuration is handled.

It is possible we might have to make a slight change to MediaWiki Core, but those should be considered (and if required, subjected to a performance review) separately. CommunityConfiguration is an independent extension which provides a configuration service (in addition to mechanisms already provided by MediaWiki Core, such as GlobalVarConfig).

@Urbanecm_WMF From the code documentation, I understand IConfigurationProvider to be the main entrypoint. However, I could not find an example of this in the extensions tests. Could you add a high-level integration test to demonstrate how it would be used? That would make it easier to check which parts will become used in the critical path, and which not.

Change #1018735 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] Add high-level integration tests for entrypoints

https://gerrit.wikimedia.org/r/1018735

@Urbanecm_WMF From the code documentation, I understand IConfigurationProvider to be the main entrypoint.

Indeed. There is also another significant entrypoint, WikiPageConfigReader (it nearly-immediately wires into an IConfigurationProvider, so it might not matter in your context). This class would be important when working with actual MediaWiki configuration.

However, I could not find an example of this in the extensions tests. Could you add a high-level integration test to demonstrate how it would be used? That would make it easier to check which parts will become used in the critical path, and which not.

Absolutely. I uploaded https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CommunityConfiguration/+/1018735/, which adds high-level integration tests for both of the main entrypoints. Hopefully this clarifies the idea we have here. If there is something else we can do to make the performance review easier, please let me know :).

Change #1018735 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] [tests] Add high-level integration tests for entrypoints

https://gerrit.wikimedia.org/r/1018735

However, I could not find an example of this in the extensions tests. Could you add a high-level integration test to demonstrate how it would be used? That would make it easier to check which parts will become used in the critical path, and which not.

Absolutely. I uploaded https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CommunityConfiguration/+/1018735/, which adds high-level integration tests for both of the main entrypoints. Hopefully this clarifies the idea we have here. If there is something else we can do to make the performance review easier, please let me know :).

The tests were just merged. Please let me know if there is anything else we need to clarify further.

@Krinkle - Is there any more clarity or info you need from the Growth team to help support this review?

@Krinkle - Is there any more clarity or info you need from the Growth team to help support this review?

My apologies for not posting sooner. I thought I had done so already. Here goes.

Set up

Based on the two integration tests in the repo, I used this minimal local setup:

LocalSettings.php
$wgCommunityConfigurationProviders = [
	'foo' => [
		'store' => [
			'type' => 'wikipage',
			'args' => [ 'MediaWiki:Foo.json' ],
		],
		'validator' => [
			'type' => 'jsonschema',
			'args' => [ FooSchema::class ]
		],
		'type' => 'mw-config',
	],
];

use MediaWiki\Extension\CommunityConfiguration\Schema\JsonSchema;
use MediaWiki\MediaWikiServices;
$wgExtensionFunctions[] = function () {
	class FooSchema extends JsonSchema {
		public const FooThing = [
			JsonSchema::TYPE => JsonSchema::TYPE_STRING,
			JsonSchema::DEFAULT => 'initial',
		];
	}

	class SpecialFooConfig extends SpecialPage {
		public function __construct() {
			parent::__construct( 'FooConfig' );
		}
		public function execute( $par ) {
			$this->getOutput()->setPageTitle( 'Foo Config' );
			$this->getOutput()->addHtml(
				Html::element( 'p', [], 'The value of FooThing is:' )
				. Html::element( 'pre', [], $this->getFooThing() )
			);
		}
		private function getFooThing() {
			$reader = MediaWikiServices::getInstance()->get( 'CommunityConfiguration.WikiPageConfigReader' );
			return $reader->get( 'FooThing' );
		}
	}
};
$wgSpecialPages['FooConfig'] = 'SpecialFooConfig';
$wgHooks['LocalisationCacheRecache'][] = function ( $localisationCache, $code, &$allData ) {
	$allData['specialPageAliases']['FooConfig'] = ['FooConfig'];
};

My wiki runs Quickstart via composer server, with PHP 8.2 on a Mac. I have these cache settings:

$wgMainCacheType = CACHE_ACCEL;
$wgParserCacheType = CACHE_DB;
$wgSessionCacheType = CACHE_DB;

Slightly off-topic, I found the WikiPageConfigReader service easy to use. I like that it's a ready-to-use service, where each key is directly available, with mutual exclusivity ensured at the last mile to avoid mistakes. I believe an earlier version involved more boilerplate for the common case, or required indirection from a factory (not worried aboutg overhead in runtime per-se, but specifically human cognition). This was a breeze! Everything it needs to know is declared in the provider, and the built-in reader takes it from there. Great!

Backend cost

Two scenarios:

  • Load Special:FooConfig, to measure incurred backend costs, reflective of consumption by extensions in hooks, special pages, API modules, ResourceLoader package callbacks, etc.
  • Submit edit form at MediaWiki:Foo.json, to approximate moving parts added to editing, representative of manual edits and API edits.

I recommend Excimer when profiling locally, e.g. the Speedscope flame graph recipe which you can drop into LocalSettings. Set forceprofile=1 on any request, and open the last logs/speedscope.json file via https://www.speedscope.app/ or via our copy at https://performance.wikimedia.org/excimer/speedscope/. It uses a 1ms interval by default. I sometimes change this to 0.0001s (0.1ms precision) to get a more complete picture.

Backend: Config reader

I'll start with config reading, which I expect to be the most sensitive. This is the cost we would pay during most requests, anywhere that this feature is consumed or integrated within by an extension.

I've left out 1 warm up for opcache to autoload/compile class files, after which I emptied php-apcu.

Screenshot 2024-04-24.png (1×2 px, 462 KB)

The left-side shows service wiring for ConfigurationProviderFactory. I reviewed both the above profiles, and the calls in the codebase itself, looking for things that are (or could later become) expensive or cause congestion. You currently read site config as-is, and only allocate service objects, which is exactly what you'd want for hot service paths. There is no no mismatch between config and business logic that I can see, e.g. a translation gap, or post-processing that could be deferred but isn't. Nice!

I did notice a tiny redundancy in how NullLogger is set up, only to be replaced again later. This doesn't add any measurable overhead today, but might bite you later on. In core/extensions we usually implement the PSR LoggerInterface interface as injected parameter. In library code, it is indeed more common to lean on the PSR LoggerAwareInterface instead, as that makes for a more stable/extendible constructor signature (or hide the constructor entirely, if your library exposes objects by other means). But for most non-lib code, I find this can lead to missed debug messages if the constructor ends up growing and/or calling other functions, as there'd inevitably be a blind spot between the two points, which can be quite confusing. The pattern may also inspire redundant construction/replacing with classes that aren't as cheap as NullLogger. Something to consider, perhaps.

After service wiring, we see the first BagOStuff cache miss (in blue) for getVariableToProviderMap. This re-purposes MediaWiki core's MediaWiki\Settings\Source\ReflectionSchemaSource in order to convert JsonSchema into a validator. I'd core's ReflectionSchemaSource is not really written for production usage in the critical path. (That's why MediaWiki core doesn't use it for MainConfigSchema, it's only called from a maintenance script.) But, your schemas are much smaller than core's, so the cache miss should be small enough to absorb in real-time on-demand. It's good that this is also cached, but remember to "Design for cache miss" (ref https://wikitech.wikimedia.org/wiki/MediaWiki_Engineering/Guides/Backend_performance_practices).

I note that it currently uses a combined key for all data, but fragmented by wiki. I wonder if this could use a global cache key instead of a per-wiki one? I suspect not, because the installed extensions may vary between wikis (plus train deploy / multiversion). But that affects your current logic as well. It seems you might be missing a mechanism to invalidate this data after updating code, toggling extensions, or deploying other changes. Perhaps there's a way to detect stale values post-cache, and/or incorporate it into the cache key.

After that, we see the biggest chunk, which is fetching the Foo config value from the wikipage. I have not yet compared this with my recommended approach, but I believe there is room for improvement here. This currently uses core's RevisionStore->getRevisionByTitle which unconditionally makes a database query. If your needs are satisfied by RevisionStore->getKnownCurrentRevision I would recommend trying that instead, as that allows for core to take care of several potential problems automatically, including caching (and cache invalidation). You may also want to call JsonContent->getData instead of parsing the JSON inline. For a good example of fetching revision text in the critical path, I recommend looking at MessageCache.php.

I noticed the CommunityConfiguration Loader.php mentions working around the WANCache's in-process cache to ease testing, and it instead re-creates its own in-process cache. It is by design that WANCache doesn't clear in-process cache, as this avoids consumers forgetting to handle cache requirements correctly. If a caller needs to use the latest value, it needs to bypass the cache rather than rely on cache being emptied or updated instantly, which can't happen in practice across multiple DB replicas, web servers, and data centres. If your test is intending to simulate two consecutive requests, we usually re-instantiate the service (i.e. CC Loader) and WANCache class between test steps (but use the same HashBagOStuff as backend). That should naturally clear any in-class cache and create a more representative scenario to test the various code interactions. Hope that helps!

Frontend

Two scenarios:

You can profile these with Excimer in the background. I tend to capture Ajax requests by temporarily changing if ( forceprofile ) to if ( 1 ) in LocalSettings. For the frontend metrics, refer to https://wikitech.wikimedia.org/wiki/MediaWiki_Engineering/Guides/Measure_frontend_performance.

Configuration page UI

The UI has been talked about in various back channels and direct messages already, and I understand the team is open to transitioning to HTMLForm once DST has added the required Codex features to HTMLForm. For transparency and future reference, I'll document my observation here on-task as well.

Special:CommunityConfiguration/Mentorship renders a blank page, with a visible delay before a form appears. From the experience of an end-user, once it does appear, it looks and feels like a standard MediaWiki HTMLForm, which I imagine was the goal.

Screenshot 2024-04-06 2.png (1×2 px, 99 KB)

I can see considerable amounts of effort and custom code went into meeting that goal — investing significantly more time into a custom solution, and creating significantly more maintenance burden for yourself, yet with an outcome that is notably less performant, less accessible/compatible than MediaWiki provides by default. When on a slow connection, or when JS times out, or has failed for any reason, or is disrupted by a browser extension, or when in a Grade C browser (read more under Principles & General Approach) — there is only a curious blank page. By comparison, HTMLForm provides zero-delay rendering (i.e. server-side), automatic validation, progressively-enhanced hydration for typeahead, title/user prefix search, multiple choice, conditional hiding, etc. via an API that is generally liked by developers (declarative, static validation by Phan in CI, little to no custom code to maintain or test).

This is not a critique on the team's ability. It is very hard to compete with a platform that many teams have improved and contributed to over two decades, specifically optimised for rapid and easy mass-adoption, and to deliver the best possible performance.

I'm raising this concern, aware of and respecting your priorities (slower performance is accepted at the moment, is my understanding). Instead, I say share for two other reasons:

  1. To counter the narrative that "more performant = more effort".

We've designed the platform such that the solution that takes the least effort, naturally gets you (close to) the best possible infrastructure performance (i.e. keep the site up, secure, and fast), best audience reach and equity, browser compatibility, and accessibility. The end result is that today, ~100% of special pages that render a form in production today, are built with this. Including, or perhaps especially, those that were built with tight deadlines and resource constraints.

  1. To create an opportunity to improve the platform to meet your needs.

WMF cannot afford to maintain as many products as it does, if each one was custom built. WMF also cannot meet the expectations around performance and user experience of modern audiences, and the audience equity/reach that we aim for, if these responsibilities rest fully on individual team resourcing priorities. This makes it all the more important to build features on a stable platform such that maintenance costs are kept low.

The "return on investment" of platform improvements degrades a bit each time a team works around MediaWiki core. This may be absolutely necessary or justified in certain cases (VisualEditor comes to mind). But when this happens, I think it's important to do so with intention, and to recognise whether the product has an inherent need to, or whether there's an issue that we should understand better. The performance review is currently a natural place for these disconnects to be discovered and understood. Our shared goal is precisely so that individual teams don't have to pay these costs again, and thus empower you and others to develop similar interfaces more quickly in the future.

Based on offline conversations, I understand that using MediaWiki core's HTMLForm was considered, but as result of conflicting signals from various sides, the current compromise was made. If the current client-side custom approach is preferred long-term, I would normally ask to manage and communicate the risks as follows:

  1. Include a one-line message in the blank placeholder, indicate to end-users that the form is loading (hidden via client-nojs).
  2. Include a one-line message in the blank placeholder stating the page requires JavaScript (hidden via client-js).
  3. Document in this task the trade-off behind your choice in relation to this feature. For example, if the custom solution offers a better UX (describe briefly what's better, to helps us improve the platform and allow other teams to avoid the same cost in the future). Or perhaps, if your solution is expected to require less maintenance upkeep (describe which maintenance costs you expect to avoid). Or perhaps, it was expected to take less time to develop, i.e. quicker to meet your product requirements (describe which requirements were not met by platform capabilities, and/or which capabilities were believed to require more time to adopt).

WMF thrives when individual capabilities can be built quickly with little to no custom code, and thus little to no on-going maintenance, testing, and updating by leveraging stable components. We build, share, and re-use. Creating a special page with an HTMLForm is generally the quickest way to deliver a capability (minutes to hours, not days/weeks/months), with an output that is close to optimal for all involved.

Misc observations

  • The landing page of Special:CommunityConfiguration renders fast and is immediately interactive. Very nice!
  • [UX] When logged-out, or logged-in as non-admin, https://es.wikipedia.beta.wmflabs.org/wiki/Especial:CommunityConfiguration presents a link to https://es.wikipedia.beta.wmflabs.org/wiki/Especial:CommunityConfiguration/Mentorship, which in turn presents a form that appears to be (and is) editable. And, when submitted it even asks for an edit summary. Only after all that, it fails with a permission error. HTMLForm provides a disabled/read-only mode, which you could enable after a "quick" permission check in MediaWiki, to render above the form, such that users with insufficient permissions find out earlier. You might want to do something similar.
  • [Bug] Dismiss tooltip displays broken message key: ⧼communityconfiguration-editor-message-dismiss-button-label⧽.
  • [Bug] Permission error displays unprocessed wikitext to users: [https:… …] and [ $1 …].

Screenshot 2024-04-06 3.png (633×2 px, 94 KB)

Change #1028832 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/CommunityConfiguration@master] Editor: add loading state and nojs fallback

https://gerrit.wikimedia.org/r/1028832

Krinkle triaged this task as Medium priority.May 13 2024, 2:09 PM

Thank you for the in-depth review! Just a quick note from me right now, that I do not have access to two the files/images referenced. They only show up to me as {F49648003 height=400} and {F49648061 height=150}.

If the config is loaded on every page view, I would recommend caching it for a short period of time, say one minute. Loding the contents of a page on every page view strikes me as asking for trouble.

...if you cache the *validated* config, you can probably keep validating on read.

Though I'd still consider avoiding that, as discussed in the meeting today.