Page MenuHomePhabricator

Analyse if Equivset should use static array file instead of ".ser" file (serialised php text)
Closed, ResolvedPublic

Description

In the spirit of T212460 and T127780 some big data structures are converted from serialized files to use php static array to be cached in opcache.

Maybe this is also an option for Equivset

At the moment a json file is shipped directly with Equivset, but no php file.
It should be easy to adjust the generate command to provide such a file from the base file.

The usage of equivset in AntiSpoof for account creation makes it called only once.
For TitleBlacklist there is typically one call via AntiSpoof (cached in WAN)
In AbuseFilter there are multiple calls

AntiSpoof and AbuseFilter creating both one instance which loads the data from the file system

Event Timeline

Change 908548 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/libs/Equivset@master] Add equivset data as static array file

https://gerrit.wikimedia.org/r/908548

Krinkle triaged this task as Medium priority.May 1 2023, 2:25 PM
Krinkle moved this task from Limbo to Perf recommendation on the Performance-Team (Radar) board.
Krinkle awarded a token.

Change 908548 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Add equivset data as static array file

https://gerrit.wikimedia.org/r/908548

Change 914015 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/libs/Equivset@master] Equivset: Optimise load() method to avoid uncached I/O stats

https://gerrit.wikimedia.org/r/914015

Change 914015 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Equivset: Optimise load() method to avoid uncached I/O stats

https://gerrit.wikimedia.org/r/914015

The static array file is now created and support for reading it was also added.

The next step is to decide how to release it and when to enable it.
One way could be to build a new release with the file to allow further testing on wmf production and enable it as default in a next release.
Directly enable it and release that without fallback strategy.

Krinkle subscribed.

The next step is to decide how to release it and when to enable it.

  • One way could be to build a new release with the file to allow further testing on wmf production and enable it as default in a next release.
  • Directly enable it and release that without fallback strategy.

@Umherirrender

If release as opt-in, we will need a patch in MW code to opt-in, and that patch is the way we can rollback. If a fix is needed, we will need a second release to migitate it, and eventually a third release that enables it by default.

If we release as default-on, we can rollback by reverting the composer.json/vendor update in an emergency. If a fix is needed, we will need a second release. (No third is needed.)

I think the default-on is simpler, but carries the increased risk to third party users of the composer package. I think that's an acceptable risk. But, if you're interested in doing this, I'm fine either way, you decide!

This code is included in MediaWiki-Vendor as of rMWVDf417c582f64d: Upgrading wikimedia/equivset (1.4.3 => 1.5.1).

For obvious reasons we don't to be attempting to hard-code the path to dist/equivset.php (because the vendor could be in MW core or in the extension) in the AntiSpoof extension itself... And there's (currently) no way to make this happen in WMF config/similar.

So either using some sort of config var/param into Equivset (which is potentially exposing implementation details), or changing the default, which seems more reasonable...

Change 962206 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/libs/Equivset@master] Use PHP Static Data Array by default

https://gerrit.wikimedia.org/r/962206

There are some basic tests in AntiSpoof, but I'm not sure they're all encompassing to test "is this going to work fully"

Change 962206 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Use PHP Static Data Array by default

https://gerrit.wikimedia.org/r/962206

Change 1001274 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/libs/Equivset@master] Remove equivset data as serialized file

https://gerrit.wikimedia.org/r/1001274

When looking at a new flamegraph for index.php it shows the following percent (via Lua via TitleBlacklist in "showMissingArticle"):

Wikimedia\Equivset\Equivset::normalize (888 samples, 0.18%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.index.svgz?x=401.9&y=53

Two weeks ago:

Wikimedia\Equivset\Equivset::normalize (1,577 samples, 0.27%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.index.svgz?x=417.8&y=581

For action=edit the percent looking (used via TitleBlacklist): - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.all.fn-EditAction.svgz

Wikimedia\Equivset\Equivset::normalize (5 samples, 0.13%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.all.fn-EditAction.svgz?x=490.6&y=901
Wikimedia\Equivset\Equivset::normalize (12 samples, 0.30%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.all.fn-EditAction.svgz?x=970.1&y=901
Wikimedia\Equivset\Equivset::normalize (90 samples, 2.27%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.all.fn-EditAction.svgz?x=318.6&y=917

Two weeks ago: https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.fn-EditAction.svgz

Wikimedia\Equivset\Equivset::normalize (13 samples, 0.28%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.fn-EditAction.svgz?x=977.3&y=789
Wikimedia\Equivset\Equivset::all (24 samples, 0.52%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.fn-EditAction.svgz?x=448.7&y=757
Wikimedia\Equivset\Equivset::load (24 samples, 0.52%) - shown in link for ::all
Wikimedia\Equivset\Equivset::normalize (29 samples, 0.63%) - shown in link for ::all
Wikimedia\Equivset\Equivset::normalize (17 samples, 0.37%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.fn-EditAction.svgz?x=956.0&y=725
Wikimedia\Equivset\Equivset::normalize (13 samples, 0.28%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.fn-EditAction.svgz?x=945.8&y=741
Wikimedia\Equivset\Equivset::normalize (174 samples, 3.78%) - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.fn-EditAction.svgz?x=272.4&y=805

That means the loading of the file is no longer shown in the flamegrah (Equivset::load or Equivset::all lazy calling ::load), that sounds good to me at that was the main motivation.
TitleBlacklist is using a cache for the AntiSpoof normalization on edit, which includes the Equivset call from above.

The all flamegraph does not show the call at all - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.all.svgz
Two weeks ago there is at least an entry shown via ShowMissingArticle message parsing - https://performance.wikimedia.org/arclamp/svgs/daily/2024-01-27.excimer.all.svgz?x=674.0&y=885

Wikimedia\Equivset\Equivset::normalize (1,577 samples, 0.09%)

Here is the same trace in new flamegraph - https://performance.wikimedia.org/arclamp/svgs/daily/2024-02-10.excimer.all.svgz?x=690.4&y=293

Change 1001274 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Remove equivset data as serialized file

https://gerrit.wikimedia.org/r/1001274