Page MenuHomePhabricator

Use php-hrtime monotonic clock instead of microtime for perf measure in MW
Open, MediumPublic

Description

Problem

We currently use microtime(true) quite a lot. This internally uses gettimeofday() in C, which isn't monotonic and subject to clock drift etc.

Impact

The metrics we collect for WANObjectCache, ResourceLoader, Backend Save Timing etc. are all affected by this and thus cannot be trusted. Their mean average might give a rough sense of how long something takes under "boring" circumstances, but it's not a useful way to set alerts because we can't trust the outliers.

For backend metrics it's important that we monitor the p99 and max, not the mean average (why). However, due to clock drift, the metrics that fire these alerts are difficult to trust (e.g. T242024 and T233059).

Examples:

Requirements

PHP 7.3+ introduces hrtime() for this purpose, which internally uses clock_gettime(CLOCK_MONOTONIC) in C. This is comparable to performance.now() in web browsers, and process.hrtime() in Node.js.

We'll need to provide a way to get access to the monotonic timer in within the MediaWiki run-time.

Proposal

It looks like it will probably take at least a year before we run MediaWiki on PHP 7.3 by default at WMF. Both for that reason and for third parties, I suggest we invest in supporting a way on PHP 7.2 to get monotonic timers.

Assuming this can't be done in pure PHP, it will require a PHP extension (either from Pecl or to developer our own).

We would then conditionally define hrtime() if absent to use that instead (for older PHP versions), with ultimate falllback to microtime(true) if the user is neither PHP 7.3 nor the extension in question.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
mediawiki/libs/IPUtilsmaster+31 -17
mediawiki/extensions/Wikibasemaster+3 -8
mediawiki/coremaster+60 -30
mediawiki/extensions/Wikibasemaster+6 -10
mediawiki/extensions/ConfirmEditmaster+12 -7
mediawiki/coremaster+26 -31
mediawiki/extensions/Wikibasemaster+2 -17
mediawiki/libs/Timestampmaster+89 -34
mediawiki/libs/Timestampmaster+3 -61
mediawiki/coremaster+1 -1
mediawiki/coremaster+203 -197
mediawiki/coremaster+3 -3
mediawiki/coremaster+5 -7
mediawiki/extensions/CheckUsermaster+6 -42
mediawiki/libs/Timestampmaster+17 -14
mediawiki/extensions/CheckUsermaster+2 -1
mediawiki/coremaster+3 -2
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Changelog:

It looks like no nominal changes have been made to the code at all since its first commit, so I believe the following one commit would cleanly apply and suffice:

  1. 83497327e7: Implement high resolution monotonic timer function hrtime

The following seem optional for us, but might be good to include for consistency (if they are compatible with PHP 7.2):

  1. a795bd8265: hrtime: Optimize zend_hash_real_init
  2. 19d31433dc: hrtime: Enable on OpenBSD
  3. b9dfa12853: hrtime: Wrap in extern "c" {} macro for use in extensions
  4. 4a55794b98 FIXUP hrtime: Wrap in extern "c" {} macro for use in extensions

To avoid potential misunderstandings; I currently don't have time to handle/test such a backport myself, my earlier comment from above referred to carrying an existing, backported patch forward across our remaining 7.2.x package life cycle. But if anyone from Performance Team tests such a patch e.g. in Beta I'm happy to include it in the next round of PHP updates.

I'm happy to backport/test this. Can you point me to where our PHP is built from? I found https://gerrit.wikimedia.org/r/admin/repos/operations/debs/pkg-php/php,branches but it doesn't look current.

We do local rebuild of the packages of the sury.org repo. I think the easiest way to obtain the Debian source package if you run "apt-get source php7.2" on a random app server.

Are we going to try to get sury.org to "upstream" our patch? Or maintain the patch somewhere else? (Gerrit? Attachment to this task? Somewhere else?)

The sury.org repos stick very close to the upstream releases, that won't happen realistically, especially for a feature backport.

We don't use Gerrit for your backports, it's mostly just some local commands/scripts on our build host. If you have a patch for 7.2, simply attach to this task and I'll take it from there, ok?

jijiki added a project: User-jijiki.

@dpifke please ping me to roll out the package when you are ready

For the MW side, we have these two fairly popular polyfills for third-parties:

They both more or less do the same, in that they do "nothing" when the function is natively available, and otherwise fallback to calling microtime() once to obtain a start time, and then on hrtime calls subtract it from the then-current microtime.

Of the two, I like polyfill-hrtime slightly more as it also picks up the php-hrtime extension if present (from Debian or PECL), which MediaWiki developers could install locally whilst working with PHP 7.2, which seems to be a common choice in our dev environments, so as to match prod, but we might not have the patched package from prod in our dev environment.

The reason we'd need something like this is so as to allow code to consistently call hrtime rather than having inline checks all over our code base.

For the MW side, we have these two fairly popular polyfills for third-parties:
<snip>

I agree that we need a polyfill, but I don't think we need it for third-parties. Release branches already require PHP 7.3, which has hrtime built-in. So the polyfill would just be needed for devs still running 7.2 who aren't using our theoretical patched 7.2+hrtime packages.

Ah, I forgot about that. Agreed yeah, it's just for CI and dev then.

Looks like we already bundle and deploy symfony/polyfill-php73 in production for symfony/console, which in turn is bundled for psy/psysh, which powers the maintenance/shell.php script.

They're mutually exclusive so let's just stick with this one then. Note that mass adoption in the critical path should wait until this patch is applied to production.

Krinkle renamed this task from Use monotonic clock instead of microtime() for perf measures in MW PHP to Use php-hrtime monotonic clock instead of microtime for perf measure in MW.Jun 17 2022, 4:46 PM

Re-opening for original purpose. The php-hrtime blocked is effectively resolved but we still need to go through and switch MW code to use hrtime instead of time/microtime for callers that measure duration instead of timestamp.

Untagging serviceops since php-hrtime is part of PHP 7.4 by default and no longer requires server provision changes.

Tagging MediaWiki-libs-Stats since it includes recently added code that makes use of microtime() for time duration purposes, rather than timestamps, thus appropiate for adopting hrtime() instead.

Change 965568 had a related patch set uploaded (by Cwhite; author: Cwhite):

[mediawiki/core@master] Stats: convert TimingMetric usage of microtime() to hrtime()

https://gerrit.wikimedia.org/r/965568

colewhite changed the task status from Open to In Progress.Oct 12 2023, 8:11 PM
colewhite claimed this task.

Change 965568 merged by jenkins-bot:

[mediawiki/core@master] Stats: convert TimingMetric usage of microtime() to hrtime()

https://gerrit.wikimedia.org/r/965568

colewhite changed the task status from In Progress to Open.Jan 19 2024, 9:13 PM
colewhite removed colewhite as the assignee of this task.
colewhite subscribed.

StatsLib now uses hrtime() when using start() and stop().

Unassigning myself because I am not currently working on migrating other uses.

Change #1062104 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/libs/Timestamp@master] Implement fake-able hrtime(), deprecate microtime()

https://gerrit.wikimedia.org/r/1062104

Change #1070637 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/CheckUser@master] Temporarily silence deprecation of ConvertibleTimestamp::microtime

https://gerrit.wikimedia.org/r/1070637

Change #1070637 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Temporarily silence deprecation of ConvertibleTimestamp::microtime

https://gerrit.wikimedia.org/r/1070637

Change #1062104 merged by jenkins-bot:

[mediawiki/libs/Timestamp@master] Implement fake-able hrtime(), deprecate microtime()

https://gerrit.wikimedia.org/r/1062104

Change #1075662 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/libs/Timestamp@master] Release v4.2.0

https://gerrit.wikimedia.org/r/1075662

Change #1075662 merged by jenkins-bot:

[mediawiki/libs/Timestamp@master] Release v4.2.0

https://gerrit.wikimedia.org/r/1075662

Change #1093954 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/CheckUser@master] Replace deprecated ConvertibleTimestamp::microtime with hrtime

https://gerrit.wikimedia.org/r/1093954

Change #1093954 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Replace deprecated ConvertibleTimestamp::microtime with hrtime

https://gerrit.wikimedia.org/r/1093954

I didn't know about this task beforehand, but https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1138519 fixed this in Parsoid because we needed more precise/accurate profiles than what microtime gave us.

Change #1139144 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] ResourceLoader: Switch perf measurement from microtime to hrtime

https://gerrit.wikimedia.org/r/1139144

Change #1139145 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] filebackend: Use hrtime instead of microtime for latency measure

https://gerrit.wikimedia.org/r/1139145

Change #1139146 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] rdbms: Remove timestamps from ChronologyProtector

https://gerrit.wikimedia.org/r/1139146

Change #1139144 merged by jenkins-bot:

[mediawiki/core@master] ResourceLoader: Switch perf measurement from microtime to hrtime

https://gerrit.wikimedia.org/r/1139144

Change #1139145 merged by jenkins-bot:

[mediawiki/core@master] filebackend: Use hrtime instead of microtime for latency measure

https://gerrit.wikimedia.org/r/1139145

Change #1139146 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Remove timestamps from ChronologyProtector

https://gerrit.wikimedia.org/r/1139146

Change #1144663 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput: Use hrtime() instead of microtime() for parser limit report

https://gerrit.wikimedia.org/r/1144663

Change #1144663 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput: Use hrtime() instead of microtime() for parser limit report

https://gerrit.wikimedia.org/r/1144663

Change #1160235 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/libs/Timestamp@master] [BREAKING CHANGE] Drop ConvertibleTimestamp::microtime(), deprecated in v4.2.0

https://gerrit.wikimedia.org/r/1160235

Change #1160235 merged by jenkins-bot:

[mediawiki/libs/Timestamp@master] [BREAKING CHANGE] Drop ConvertibleTimestamp::microtime(), deprecated in v4.2.0

https://gerrit.wikimedia.org/r/1160235

Change #1192226 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Storage,actions,api,specialpage: Adopt TimingMetric::start and ::stop

https://gerrit.wikimedia.org/r/1192226

Change #1192286 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/ConfirmEdit@master] Adopt Stats\TimingMetric::start and ::stop

https://gerrit.wikimedia.org/r/1192286

Change #1192296 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Stats: Encourage start() or observeNanoseconds()

https://gerrit.wikimedia.org/r/1192296

Change #1192612 had a related patch set uploaded (by Cwhite; author: Cwhite):

[mediawiki/extensions/Wikibase@master] fix StatslibTimeRecordingEntityParserOutputGenerator timing usage

https://gerrit.wikimedia.org/r/1192612

Change #1192612 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] fix StatslibTimeRecordingEntityParserOutputGenerator timing usage

https://gerrit.wikimedia.org/r/1192612

Change #1192226 merged by jenkins-bot:

[mediawiki/core@master] Storage,actions,api,specialpage,diff: Adopt TimingMetric::(start/stop)

https://gerrit.wikimedia.org/r/1192226

Change #1192946 had a related patch set uploaded (by Cwhite; author: Cwhite):

[mediawiki/extensions/Wikibase@master] fix StatsFactory timing usage in EditEntity and View

https://gerrit.wikimedia.org/r/1192946

Change #1192979 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/Wikibase@master] EditEntity: Update timer in StatslibSaveTimeRecordingEntityStore

https://gerrit.wikimedia.org/r/1192979

Change #1192286 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@master] Adopt Stats\TimingMetric::start and ::stop

https://gerrit.wikimedia.org/r/1192286

Change #1192979 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] EditEntity: Update timer in StatslibSaveTimeRecordingEntityStore

https://gerrit.wikimedia.org/r/1192979

Change #1192946 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] fix StatsFactory timing usage in EditEntity and View

https://gerrit.wikimedia.org/r/1192946

Change #1227383 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/libs/IPUtils@master] benchmarks: Improve benchmark

https://gerrit.wikimedia.org/r/1227383

Change #1227383 merged by jenkins-bot:

[mediawiki/libs/IPUtils@master] benchmarks: Improve benchmark

https://gerrit.wikimedia.org/r/1227383