Page MenuHomePhabricator

Create phonos Jobs to handle mass file generation
Closed, ResolvedPublic3 Estimated Story Points

Description

When we release phonos to a new wiki and the "meta" template for IPA changes to include the <phonos\> tag, the htmlCacheUpdate job will re-parse all the pages where the template is transcluded in and trigger hundreds if not thousands of calls to the IPA Engine.
We need a way to rate limit those calls to comply with Google's rate limit.

Useful links

Acceptance Criteria

  • Since we will be using Google in production, job execution should be throttle to a sensitive rate depending on the rate limit we'll be getting from google. Right now it is 1000 requests per minute
  • A job should be schedule when phonos is triggered from a cli command and not from a http request - T318979
  • Jobs should not be duplicated in the queue

Event Timeline

Change 837184 had a related patch set uploaded (by Dmaza; author: Dmaza):

[mediawiki/extensions/Phonos@master] Create Job to handle mass file generation

https://gerrit.wikimedia.org/r/837184

For testing purposes, you can use php maintenance/shell.php to add jobs (see code below) and php maintenance/runJobs.php to execute the jobs.

Add 1 single job

use MediaWiki\MediaWikiServices; 

$jobParams = [ 'ipa' => 'həˈləʊ', 'text' => 'hello', 'lang' => 'en'];

$job = new MediaWiki\Extension\Phonos\Job\PhonosIPAFilePersistJob($jobParams);

$jobQueue = MediaWikiServices::getInstance()->get('JobQueueGroup');

$jobQueue->push($job);

In order to test the throttling you need to set $wgJobBackoffThrottling['phonosIPAFilePersist'] to something like 1/5 (1 every 5s) and then add unique jobs in a loop like so.

Add multiple unique jobs

use MediaWiki\MediaWikiServices; 

$jobQueue = MediaWikiServices::getInstance()->get('JobQueueGroup');

for ($i=0; $i<6; $i++) {
	$char = chr(rand(97,122));
	$jobParams = [ 'ipa' => 'həˈləʊ'.$char, 'text' => 'hello'.$char, 'lang' => 'en'];
	$job = new MediaWiki\Extension\Phonos\Job\PhonosIPAFilePersistJob($jobParams);
	$jobQueue->push($job);
}

Change 837184 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Create Job to handle mass file generation

https://gerrit.wikimedia.org/r/837184

On my local docker, running:

drw@blackbird:~/wikimedia/srv/lone$ docker-compose exec mediawiki php maintenance/shell.php
PHP Notice:  Writing to directory /.config/psysh is not allowed. in /var/www/html/w/vendor/psy/psysh/src/ConfigPaths.php on line 362

Notice: Writing to directory /.config/psysh is not allowed. in /var/www/html/w/vendor/psy/psysh/src/ConfigPaths.php on line 362
Psy Shell v0.11.8 (PHP 7.4.32 — cli) by Justin Hileman
>>> use MediaWiki\MediaWikiServices; 
>>> 
>>> $jobParams = [ 'ipa' => 'həˈləʊo', 'text' => 'helluwrioruiweruiorquirwuroiwurweiruiwosqerty', 'lang' => 'en'];
=> [
     "ipa" => "həˈləʊo",
     "text" => "helluwrioruiweruiorquirwuroiwurweiruiwosqerty",
     "lang" => "en",
   ]

>>> 
>>> $job = new MediaWiki\Extension\Phonos\Job\PhonosIPAFilePersistJob($jobParams);
=> MediaWiki\Extension\Phonos\Job\PhonosIPAFilePersistJob {#4668
     +command: "phonosIPAFilePersist",
     +params: [
       "ipa" => "həˈləʊo",
       "text" => "helluwrioruiweruiorquirwuroiwurweiruiwosqerty",
       "lang" => "en",
       "requestId" => "21eb3e0dbc23b776d9b8aa35",
     ],
     +metadata: [],
   }

>>> 
>>> $jobQueue = MediaWikiServices::getInstance()->get('JobQueueGroup');
=> JobQueueGroup {#4651}

>>> 
>>> $jobQueue->push($job);
=> null

>>>

In the debug logs I see:

...
[runJobs] phonosIPAFilePersist Special: ipa=həˈləʊo text=helluwrioruiweruiorquirwuroiwurweiruiwosqerty lang=en requestId=21eb3e0dbc23b776d9b8aa35 namespace=-1 title= (id=27428,timestamp=20221012151800) STARTING
[phonos] FileBackendStore::ingestFreshFileStats: File mwstore://phonos-backend/phonos-render/s/a/saqg6ards327xet8u1jrhypa7lhwrir.mp3 does not exist
[exec] Creating base path /tmp/shellbox-d91ec4c1c7b14e55
[exec] Executing: /bin/bash '/var/www/html/w/vendor/wikimedia/shellbox/src/Command/limit.sh' ''\''/usr/bin/espeak'\'' '\''--stdin'\'' '\''-m'\'' '\''--stdout'\''' 'SB_INCLUDE_STDERR=;SB_CPU_LIMIT=180; SB_CGROUP='\'''\''; SB_MEM_LIMIT=1073741824; SB_FILE_SIZE_LIMIT=104857600; SB_WALL_CLOCK_LIMIT=180; SB_USE_LOG_PIPE=yes'
[exec] Possibly missing executable file: /bin/bash '/var/www/html/w/vendor/wikimedia/shellbox/src/Command/limit.sh' ''\''/usr/bin/espeak'\'' '\''--stdin'\'' '\''-m'\'' '\''--stdout'\''' 'SB_INCLUDE_STDERR=;SB_CPU_LIMIT=180; SB_CGROUP='\'''\''; SB_MEM_LIMIT=1073741824; SB_FILE_SIZE_LIMIT=104857600; SB_WALL_CLOCK_LIMIT=180; SB_USE_LOG_PIPE=yes'
[exec] Error running /bin/bash '/var/www/html/w/vendor/wikimedia/shellbox/src/Command/limit.sh' ''\''/usr/bin/espeak'\'' '\''--stdin'\'' '\''-m'\'' '\''--stdout'\''' 'SB_INCLUDE_STDERR=;SB_CPU_LIMIT=180; SB_CGROUP='\'''\''; SB_MEM_LIMIT=1073741824; SB_FILE_SIZE_LIMIT=104857600; SB_WALL_CLOCK_LIMIT=180; SB_USE_LOG_PIPE=yes': /bin/bash: /usr/bin/espeak: No such file or directory

#0 /var/www/html/w/vendor/wikimedia/shellbox/src/Command/LocalBoxedExecutor.php(41): Shellbox\Command\UnboxedExecutor->execute()
#1 /var/www/html/w/vendor/wikimedia/shellbox/src/Command/BoxedExecutor.php(20): Shellbox\Command\LocalBoxedExecutor->executeValid()
#2 /var/www/html/w/vendor/wikimedia/shellbox/src/Command/BoxedCommand.php(183): Shellbox\Command\BoxedExecutor->execute()
#3 /var/www/html/w/extensions/Phonos/includes/Engine/EspeakEngine.php(62): Shellbox\Command\BoxedCommand->execute()
#4 /var/www/html/w/extensions/Phonos/includes/Job/PhonosIPAFilePersistJob.php(56): MediaWiki\Extension\Phonos\Engine\EspeakEngine->getAudioData()
#5 /var/www/html/w/includes/jobqueue/JobRunner.php(384): MediaWiki\Extension\Phonos\Job\PhonosIPAFilePersistJob->run()
#6 /var/www/html/w/includes/jobqueue/JobRunner.php(345): JobRunner->doExecuteJob()
#7 /var/www/html/w/includes/jobqueue/JobRunner.php(249): JobRunner->executeJob()
#8 /var/www/html/w/maintenance/runJobs.php(98): JobRunner->run()
#9 /var/www/html/w/maintenance/includes/MaintenanceRunner.php(309): RunJobs->execute()
#10 /var/www/html/w/maintenance/doMaintenance.php(85): MediaWiki\Maintenance\MaintenanceRunner->run()
#11 /var/www/html/w/maintenance/runJobs.php(136): require_once(string)
#12 {main}
[exec] Removed directory "/tmp/shellbox-d91ec4c1c7b14e55"
[DBConnection] MWExceptionHandler::rollbackPrimaryChanges: acknowledged server-side transaction loss on unknown
[runJobs] phonosIPAFilePersist Special: ipa=həˈləʊo text=helluwrioruiweruiorquirwuroiwurweiruiwosqerty lang=en requestId=21eb3e0dbc23b776d9b8aa35 namespace=-1 title= (id=27428,timestamp=20221012151800) t=33 error=MediaWiki\Extension\Phonos\Exception\PhonosException: phonos-engine-error
...

and no file is generated in images/phonos-render.

Running similar commands on beta also does not generate any new files in Swift. I cannot get any useful logs from beta, I am afraid.

Locally and on beta, I also ran php maintenance/runJobs.php which returned Job queue is empty.

Running similar commands on beta also does not generate any new files in Swift. I cannot get any useful logs from beta, I am afraid.

OK, it is now magically working on beta.

OK, I can get it working locally with Google, but not with espeak or larynx.

The error in T318086#8311947 was with espeak.

With larynx, I get:

[phonos] FileBackendStore::ingestFreshFileStats: File mwstore://phonos-backend/phonos-render/l/u/luz3k9deb32xq67zznyb4wl0qw6q5vl.mp3 does not exist
[exec] Creating base path /tmp/shellbox-ba1ed58b3da82ebf
[exec] Executing: /bin/bash '/var/www/html/w/vendor/wikimedia/shellbox/src/Command/limit.sh' ''\''/usr/bin/lame'\'' '\''-'\'' '\''-'\''' 'SB_INCLUDE_STDERR=;SB_CPU_LIMIT=180; SB_CGROUP='\'''\''; SB_MEM_LIMIT=1073741824; SB_FILE_SIZE_LIMIT=104857600; SB_WALL_CLOCK_LIMIT=180; SB_USE_LOG_PIPE=yes'
[exec] Possibly missing executable file: /bin/bash '/var/www/html/w/vendor/wikimedia/shellbox/src/Command/limit.sh' ''\''/usr/bin/lame'\'' '\''-'\'' '\''-'\''' 'SB_INCLUDE_STDERR=;SB_CPU_LIMIT=180; SB_CGROUP='\'''\''; SB_MEM_LIMIT=1073741824; SB_FILE_SIZE_LIMIT=104857600; SB_WALL_CLOCK_LIMIT=180; SB_USE_LOG_PIPE=yes'
[exec] Error running /bin/bash '/var/www/html/w/vendor/wikimedia/shellbox/src/Command/limit.sh' ''\''/usr/bin/lame'\'' '\''-'\'' '\''-'\''' 'SB_INCLUDE_STDERR=;SB_CPU_LIMIT=180; SB_CGROUP='\'''\''; SB_MEM_LIMIT=1073741824; SB_FILE_SIZE_LIMIT=104857600; SB_WALL_CLOCK_LIMIT=180; SB_USE_LOG_PIPE=yes': /bin/bash: /usr/bin/lame: No such file or directory
...

(and I checked that lame is definitely installed)

OK, I have worked out why it wasn't working. I had espeak and lame installed in the mediawiki docker image but not in the mediawiki-jobrunner image, and obviously it is the latter environment in which the jobs are run.

OK, I have worked out why it wasn't working. I had espeak and lame installed in the mediawiki docker image but not in the mediawiki-jobrunner image, and obviously it is the latter environment in which the jobs are run.

Awesome. Thank you @dom_walden

I have used the commands from T318086#8283453 to create jobs locally and on beta.

On beta I have submitted up to 1000 jobs at a time, all of which generated a file in the Swift backend. I checked that they are all of non-zero size and I listened to a few of them to check they were pronouncing the correct words.

  • Since we will be using Google in production, job execution should be throttle to a sensitive rate depending on the rate limit we'll be getting from google. Right now it is 1000 requests per minute

I guess we need to set $wgJobBackoffThrottling['phonosIPAFilePersist'] = 1000; (or less) on production.

I have checked that the value of $wgJobBackoffThrottling['phonosIPAFilePersist'] affects how regularly jobs are run.

  • A job should be schedule when phonos is triggered from a cli command and not from a http request - T318979

I will test this as part of that ticket.

  • Jobs should not be duplicated in the queue

The job queue appears to control the removing of duplicates automatically. However, I believe what counts as a duplicate as far as the job queue is concerned is based on the ipa, text and lang parameters we pass to it.

This is different to Phonos' definition of duplicates in that the latter also includes the engine and the "cache version". If we were to change the engine and/or increment the cache version and rerun the jobs, they may be counted as duplicates of jobs already in the job queue. I don't think this is likely to happen as the job queue is regularly cleared, allowing us to rerun previously completed jobs.

Also, there does not appear to be a limit to the number of duplicate jobs you can run if those jobs fail to complete (e.g. they encounter an error). So you can keep retrying failing jobs (and a failing job is automatically retried 2 times anyway). I assume this is how the job queue normally works.

*Test environments:**

I guess we need to set $wgJobBackoffThrottling['phonosIPAFilePersist'] = 1000; (or less) on production.

We had our limit increased to 10k but we should probably set it to 7k perhaps? We need to leave room for regular edits and this limit is shared between all wikis

The job queue appears to control the removing of duplicates automatically. However, I believe what counts as a duplicate as far as the job queue is concerned is based on the ipa, text and lang parameters we pass to it.

This is different to Phonos' definition of duplicates in that the latter also includes the engine and the "cache version". If we were to change the engine and/or increment the cache version and rerun the jobs, they may be counted as duplicates of jobs already in the job queue. I don't think this is likely to happen as the job queue is regularly cleared, allowing us to rerun previously completed jobs.

Fair point but like you said, unlikely

Also, there does not appear to be a limit to the number of duplicate jobs you can run if those jobs fail to complete (e.g. they encounter an error). So you can keep retrying failing jobs (and a failing job is automatically retried 2 times anyway). I assume this is how the job queue normally works.

That is correct. IIRC the default is 3, not sure if that's any different in production.