Page MenuHomePhabricator

Some requests fail with UIDGenerator error "Process clock is outdated or drifted"
Closed, ResolvedPublicPRODUCTION ERROR

Description

See for example https://integration.wikimedia.org/ci/job/mediawiki-extensions-zend/9816/consoleFull .

00:12:02.398 There were 2 failures:
00:12:02.398 
00:12:02.399 1) Warning
00:12:02.422 The data provider specified for Flow\Tests\UrlGeneratorTest::testWatchTopicLink is invalid.
00:12:02.422 Process clock is outdated or drifted.
00:12:02.422 
00:12:02.422 
00:12:02.423 2) Warning
00:12:02.423 The data provider specified for Flow\Tests\UrlGeneratorTest::testUnwatchTopicLink is invalid.
00:12:02.423 Process clock is outdated or drifted.

This error is from https://git.wikimedia.org/blob/mediawiki%2Fcore.git/7cf070f9cbb71b2bb3fcba5b7c632970ab6bf933/includes%2Futils%2FUIDGenerator.php and happens when two clocks drift apart by 2 seconds.

Event Timeline

hashar set Security to None.
hashar subscribed.

Apparently not a CI issue, seems something is flappy in either Flow test or the UIDGenerator.

The UUID generator was originally implemented by @aaron (cced) and introduced by https://gerrit.wikimedia.org/r/#/q/I989626dfb7976d17f5931a34fe040f41cf14dfe3,n,z

The drift detection has been added by Patchset 12 https://gerrit.wikimedia.org/r/#/c/37052/11..12/includes/UIDGenerator.php which Aaron commented as being "Robustness rewrite (partly as a result of testing)".

@Mattflaschen can you reach out to @aaron ?

Apparently not a CI issue

Based on what? How do we know there isn't clock drift on the CI machines (that's what it's checking for).

Read the code? A PHP bug causes the microtime() function to drift and @aaron made the uuidgenerator to abort the process. There is nothing CI can do about that code.

EBernhardson subscribed.

Unless this happens again we wont be looking into it further.

Krinkle renamed this task from "Process clock is outdated or drifted" is causing continuous integration failures to Some requests fail with UIDGenerator error "Process clock is outdated or drifted".Sep 20 2018, 1:38 AM
Krinkle raised the priority of this task from Lowest to High.

This has started to affect not just CI, but also random page views, API requests, and JobQueue jobs.

I don't know for how long, but quite likely related to the introduction of EventBus for the JobQueue, given that prior to that, UID was not used for the job queue.

Impact: Jobs may be lost before insertion.

Error

Sample Request ID: W6L0GwrAEDUAAJYVR4IAAAAB

message
[{exception_id}] {exception_url}   RuntimeException from line 482 of /srv/mediawiki/php-1.32.0-wmf.22/includes/utils/UIDGenerator.php: Process clock is outdated or drifted.
trace
#0 /srv/mediawiki/php-1.32.0-wmf.22/includes/utils/UIDGenerator.php(222): UIDGenerator->getTimeAndDelay(string, integer, integer, integer)
#1 /srv/mediawiki/php-1.32.0-wmf.22/extensions/EventBus/includes/EventBus.php(445): UIDGenerator::newUUIDv1()
#2 /srv/mediawiki/php-1.32.0-wmf.22/extensions/EventBus/includes/EventBus.php(382): EventBus::newId()
#3 /srv/mediawiki/php-1.32.0-wmf.22/extensions/EventBus/includes/JobQueueEventBus.php(43): EventBus::createEvent(string, string, array, string)
#4 /srv/mediawiki/php-1.32.0-wmf.22/extensions/EventBus/includes/JobQueueEventBus.php(133): JobQueueEventBus->createJobEvent(JobSpecification)
#5 /srv/mediawiki/php-1.32.0-wmf.22/includes/jobqueue/JobQueue.php(332): JobQueueEventBus->doBatchPush(array, integer)
#6 /srv/mediawiki/php-1.32.0-wmf.22/includes/jobqueue/JobQueue.php(302): JobQueue->batchPush(array, integer)
#7 /srv/mediawiki/php-1.32.0-wmf.22/includes/jobqueue/JobQueueGroup.php(154): JobQueue->push(array)
#8 /srv/mediawiki/php-1.32.0-wmf.22/includes/jobqueue/JobQueueGroup.php(218): JobQueueGroup->push(array)
#9 /srv/mediawiki/php-1.32.0-wmf.22/includes/deferred/MWCallableUpdate.php(34): JobQueueGroup::pushLazyJobs()
#10 /srv/mediawiki/php-1.32.0-wmf.22/includes/deferred/DeferredUpdates.php(268): MWCallableUpdate->doUpdate()
#11 /srv/mediawiki/php-1.32.0-wmf.22/includes/deferred/DeferredUpdates.php(214): DeferredUpdates::runUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, string, integer)
#12 /srv/mediawiki/php-1.32.0-wmf.22/includes/deferred/DeferredUpdates.php(134): DeferredUpdates::execute(array, string, integer)
#13 /srv/mediawiki/php-1.32.0-wmf.22/includes/MediaWiki.php(914): DeferredUpdates::doUpdates(string)
#14 /srv/mediawiki/php-1.32.0-wmf.22/includes/MediaWiki.php(734): MediaWiki->restInPeace(string, boolean)
#15 [internal function]: Closure$MediaWiki::doPostOutputShutdown()

Source: https://github.com/wikimedia/mediawiki/blob/8477d4a6f5949fcb875fb0d21800866bc548fc05/includes/utils/UIDGenerator.php#L478-L483

Notes

Logstash has recorded that 31 request processes got aborted due to this error (in the last 30 days). Most are from the callsite in EventBus when inserting jobs.

(Still seen on 1.32.0-wmf.23)

mobrovac subscribed.

Time drifting is a known occurrence. From what I see, we have two options:

  • allow it to happen and risk duplicate v1 UUIDs: the collision rate would vary with the drift, but in practice that means that some request IDs might end up with the same UUID
  • use PECL's uuid library which is a wrapper around libuuid: the library does not suffer from the time drifting problem, but requires extra packages to be installed and PECL enabled.

I would vote for the latter.

Change 464454 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@wmf/1.32.0-wmf.24] Make UID clock drift error have more details.

https://gerrit.wikimedia.org/r/464454

Change 464454 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.24] Make UID clock drift error have more details.

https://gerrit.wikimedia.org/r/464454

Mentioned in SAL (#wikimedia-operations) [2018-10-03T23:23:27Z] <catrope@deploy1001> Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: Make UID clock drift error have more details (T94522) (duration: 00m 58s)

Change 464462 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] UIDGenerator: Make internal millitime() non-static

https://gerrit.wikimedia.org/r/464462

Change 464463 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] [WIP] UIDGenerator

https://gerrit.wikimedia.org/r/464463

Change 464462 merged by jenkins-bot:
[mediawiki/core@master] UIDGenerator: Misc clean up

https://gerrit.wikimedia.org/r/464462

Change 464463 merged by jenkins-bot:
[mediawiki/core@master] UIDGenerator: Remove the clock skew problem

https://gerrit.wikimedia.org/r/464463

Change 465533 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.32.0-wmf.24] UIDGenerator: Misc clean up

https://gerrit.wikimedia.org/r/465533

Change 465533 merged by Krinkle:
[mediawiki/core@wmf/1.32.0-wmf.24] UIDGenerator: Misc clean up

https://gerrit.wikimedia.org/r/465533

Change 465535 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.32.0-wmf.24] UIDGenerator: Remove the clock skew problem

https://gerrit.wikimedia.org/r/465535

Change 465535 merged by Krinkle:
[mediawiki/core@wmf/1.32.0-wmf.24] UIDGenerator: Remove the clock skew problem

https://gerrit.wikimedia.org/r/465535

Mentioned in SAL (#wikimedia-operations) [2018-10-10T00:14:24Z] <krinkle@deploy1001> Synchronized php-1.32.0-wmf.24/tests/phpunit/includes/utils/: T94522 - I2a0c51bea58 (duration: 01m 02s)

Mentioned in SAL (#wikimedia-operations) [2018-10-10T00:19:44Z] <krinkle@deploy1001> Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: T94522 - I2a0c51bea58 (duration: 00m 56s)

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM