Page MenuHomePhabricator

Speed up Language creation
Closed, ResolvedPublic5 Estimated Story Points

Description

Currently, creating a Language on an empty l10n cache is very slow (unless manualRecache is set).

> $t1 = microtime( true ); mws()->getLanguageFactory()->getLanguage( 'he' ); $t2 = microtime( true ); echo ( $t2 - $t1 ) . PHP_EOL;
1.6950080394745

This is because in order to construct a Language, the LanguageFactory needs the language’s fallback chain (e.g. creating the arz language must fall back to the LanguageAr class in order to get its normalize() implementation), and the language fallback chain comes from the l10n cache; as soon as the LanguageFactory accesses it, LocalisationCache gets busy fully populating the l10n cache for the language, including loading all the message files. I think we can substantially speed this up by not requiring the l10n cache to be built until necessary (the fallback chain, in particular, should not require loading any extension files).

Event Timeline

Change 940158 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] POC: Create Language objects without initializing l10n cache

https://gerrit.wikimedia.org/r/940158

Change 940158 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] POC: Create Language objects without initializing l10n cache

https://gerrit.wikimedia.org/r/940158

This is a first proof of concept, where @tstarling left some very helpful comments; I’ll now try to implement this in several more mergeable changes and attach them here.

Change 940328 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] WIP: LocalisationCache: Introduce $coreOnlyKeys

https://gerrit.wikimedia.org/r/940328

Pulling this into our sprint since I plan to work on it for T340832: [STORY] MUL - While reading, make Wikibase label fallbacks or the `mul` language code default clearly visible. (But it’s not really a sprint goal in itself, so if it takes too long I might have to step back. But let’s see how far I get.)

Change 940354 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Extract loadCoreData()

https://gerrit.wikimedia.org/r/940354

Change 940355 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Load only core data if possible

https://gerrit.wikimedia.org/r/940355

karapayneWMDE set the point value for this task to 5.Jul 21 2023, 1:53 PM

Change 940878 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Reset loaded(Sub)Items when overwriting data

https://gerrit.wikimedia.org/r/940878

Change 940158 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/core@master] POC: Create Language objects without initializing l10n cache

Reason:

superseded by the other changes attached to T342418, especially I7ec2d87c0f

https://gerrit.wikimedia.org/r/940158

I was wondering if we can use this task to make mw.language objects less expensive in Lua (compare T85461#3100878 noting that they’re already fairly cheap when the l10n cache is prebuilt), but I don’t think so – formatDate presumably relies on the dateFormats data, which the changes attached here put in the NON_CORE_ONLY_KEYS, not the CORE_ONLY_KEYS.

Change 941864 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Extract readPluralFilesAndRegisterDeps()

https://gerrit.wikimedia.org/r/941864

Change 942612 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Don’t throw away data in initLanguage()

https://gerrit.wikimedia.org/r/942612

Change 942614 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Make isMergeableKey() static

https://gerrit.wikimedia.org/r/942614

Change 942670 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Expand tests

https://gerrit.wikimedia.org/r/942670

Change 942614 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Make isMergeableKey() static

https://gerrit.wikimedia.org/r/942614

Change 942670 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Expand tests

https://gerrit.wikimedia.org/r/942670

Change 940328 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Add CORE_ONLY_KEYS, ALL_EXCEPT_CORE_ONLY_KEYS

https://gerrit.wikimedia.org/r/940328

Change 942612 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/core@master] LocalisationCache: Don’t throw away data in initLanguage()

Reason:

squashed into Ib1c620d4f2

https://gerrit.wikimedia.org/r/942612

Change 940878 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Reset loaded(Sub)Items when overwriting data

https://gerrit.wikimedia.org/r/940878

Change 940355 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Load only core data if possible

https://gerrit.wikimedia.org/r/940355

Change 940354 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/core@master] LocalisationCache: Extract loadCoreData()

Reason:

squashed into I7ec2d87c0f

https://gerrit.wikimedia.org/r/940354

Change 941864 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/core@master] LocalisationCache: Extract readPluralFilesAndRegisterDeps()

Reason:

squashed into I7ec2d87c0f

https://gerrit.wikimedia.org/r/941864

I think this is resolved – now the l10n cache is (still slowly ^^) rebuilt when needed instead:

> $t1 = microtime( true ); $he = mws()->getLanguageFactory()->getLanguage( 'he' ); $t2 = microtime( true ); echo ( $t2 - $t1 ) . PHP_EOL;
0.0015640258789062
> $t1 = microtime( true ); echo $he->formatNum( 0 ) . PHP_EOL; $t2 = microtime( true ); echo ( $t2 - $t1 ) . PHP_EOL;
0
0.0021600723266602
> $t1 = microtime( true ); echo $he->formatDuration( 0 ) . PHP_EOL; $t2 = microtime( true ); echo ( $t2 - $t1 ) . PHP_EOL;
0 שניות
3.209841966629

Let’s hope no issues are discovered during the train rollout.

3.209841966629

By the way, that duration is twice as long as in the original task description because it’s rebuilding both en and he. In the task description, creating the he language only required rebuilding he. (But if you then called formatDuration() on the returned Language, the old code also ended up rebuilding en, taking another ~1½ s to do so.) So AFAICT it isn’t the case that the changes here made the l10n cache twice as slow to build 😅

Change 944853 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/core@master] Revert "LocalisationCache: Load only core data if possible"

https://gerrit.wikimedia.org/r/944853

Change 944853 merged by jenkins-bot:

[mediawiki/core@master] Revert "LocalisationCache: Load only core data if possible"

https://gerrit.wikimedia.org/r/944853

Change 944854 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/core@wmf/1.41.0-wmf.20] Revert "LocalisationCache: Load only core data if possible"

https://gerrit.wikimedia.org/r/944854

Change 944854 merged by jenkins-bot:

[mediawiki/core@wmf/1.41.0-wmf.20] Revert "LocalisationCache: Load only core data if possible"

https://gerrit.wikimedia.org/r/944854

Mentioned in SAL (#wikimedia-operations) [2023-08-02T20:46:38Z] <dancy@deploy1002> Started scap: Backport for [[gerrit:944854|Revert "LocalisationCache: Load only core data if possible" (T342418 T343375)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-02T20:48:13Z] <dancy@deploy1002> dancy: Backport for [[gerrit:944854|Revert "LocalisationCache: Load only core data if possible" (T342418 T343375)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-02T20:55:26Z] <dancy@deploy1002> Finished scap: Backport for [[gerrit:944854|Revert "LocalisationCache: Load only core data if possible" (T342418 T343375)]] (duration: 08m 47s)

Reopening, we’ll have to try this again. (First put together a test case to cover the issue in T343375, then find a way to implement faster language creation without breaking that test.)

Same goes for T343343 if it turns out to be related – try to make a test first that covers it.

I cannot reproduce the error in T343375 directly, but I can reproduce a very related seeming issue:
In my German client-only dev-wiki the contentmodel of Modul:Sandkasten (note the missing "e", it is the German name) changes from "Scribunto" to "Wikitext" depending on whether https://gerrit.wikimedia.org/r/940355 is applied or not, however it should always be "Scribunto".

Okay, I’m pretty sure T343343 (date formats) and T343375 (namespace names) have the same cause – these are exactly the two preloaded l10n cache keys:

private const PRELOADED_KEYS = [ 'dateFormats', 'namespaceNames' ];

I think I know how to fix it (replace $this->data[$code] += $preload; with a mergeItem() loop), but I’m still working on tests.

Change 945763 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Add tests for preload behavior

https://gerrit.wikimedia.org/r/945763

Change 945764 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Correctly merge preload data

https://gerrit.wikimedia.org/r/945764

Change 945765 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] LocalisationCache: Load only core data if possible (v2)

https://gerrit.wikimedia.org/r/945765

Probably worth mentioning here that there’s no train next week, so we’ll have more time than usual to test this on Beta.

Change 945763 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Add tests for preload behavior

https://gerrit.wikimedia.org/r/945763

Change 945764 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Correctly merge preload data

https://gerrit.wikimedia.org/r/945764

Change 945765 merged by jenkins-bot:

[mediawiki/core@master] LocalisationCache: Load only core data if possible (v2)

https://gerrit.wikimedia.org/r/945765

Just came here to say that I tried running LanguageConverterFactoryTest (which is super slow, see T50217#9030084) before and after the changes in this ticket. It went from 12-13 seconds to 1 second, which is AMAZING! Thank you!

I’ve played around a bit on beta dewiki and not noticed any namespace- or language-related breakage so far.

As far as I’m aware, there were no issues related to this change during the wmf.22 train, so I think we can call this done \o/