Page MenuHomePhabricator

jobrunner loops every 100ms against a result cached for 1 second
Closed, DeclinedPublic

Description

From P5240 when running the redisJobRunnerService with --verbose:

2017-04-11T14:18:34+0000 DEBUG: Redis cmd: hGetAll ["jobqueue:aggregator:h-ready-queues:v2"]
2017-04-11T14:18:34+0000 DEBUG: No jobs available...
2017-04-11T14:18:34+0000 DEBUG: No jobs available...
2017-04-11T14:18:34+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: No jobs available...
2017-04-11T14:18:35+0000 DEBUG: Redis cmd: hGetAll ["jobqueue:aggregator:h-ready-queues:v2"]

It shows we retrieve jobs every second but the loop runs every 100ms.

The loop with a 100ms sleep:

$pending =& $this->getReadyQueueMap();
if ( !count( $pending ) ) {
    $this->debug( "No jobs available..." );
    $this->incrStats( "idle.$host", 1 );
    usleep( 100000 ); // no jobs
    continue;
}

getReadyQueueMap() result is cached for 1 second.

class RedisJobRunnerService extends RedisJobService {

    const AGGR_CACHE_TTL_SEC = 1;

    /**
     * @return array Cached map of (job type => domain => UNIX timestamp)
     */
    private function &getReadyQueueMap() {
        if ( $age <= self::AGGR_CACHE_TTL_SEC ) {
            return $pendingDBs; // process cache hit
        }

Originally we we waited five seconds between iteration and that got lowered to 100ms but the cache hasn't been updated. Commit 19c880d1d62197a52fa66b09b03f4863a5fe5605

So I guess we should just sleep for a second, or even better check whether redis has a way for us to wait for new events.

Event Timeline

Change 347743 had a related patch set uploaded (by Hashar):
[mediawiki/services/jobrunner@master] Stop looping over the same cached result

https://gerrit.wikimedia.org/r/347743

hashar triaged this task as Medium priority.
hashar lowered the priority of this task from Medium to Low.Jun 19 2017, 12:45 PM
hashar moved this task from Backlog to In-progress on the Release-Engineering-Team (Kanban) board.

Change 347743 abandoned by Hashar:
Stop looping over the same cached result

https://gerrit.wikimedia.org/r/347743