Page MenuHomePhabricator

Investigate PHP max execution timeout on translatewiki.net
Closed, ResolvedPublic4 Estimated Story Points

Description

We're seeing a lot of logspam about max execution timeouts on Slack from translatewiki.net:

[01-Aug-2023 06:22:00 UTC] PHP Fatal error:  Maximum execution time of 9 seconds exceeded in ParserFunctions.php:222 Parser.php:3413 Parser.php:3098 PPFrame_Hash.php:274 Parser.php:3010 PPFrame_Hash.php:274 ParserFunctions.php:236 Parser.php:3413 Parser.php:3098 PPFrame_Hash.php:274 ParserFunctions.php:236 Parser.php:3413 Parser.php:3098 PPFrame_Hash.php:274 Parser.php:3289 PPFrame_Hash.php:274 Pa... (5 lines skipped)
[01-Aug-2023 06:15:28 UTC] PHP Fatal error:  Maximum execution time of 9 seconds exceeded in TransactionProfiler.php:333 TransactionManager.php:615 Database.php:919 Database.php:809 Database.php:730 Database.php:1433 DBConnRef.php:119 DBConnRef.php:346 Utilities.php:109 Utilities.php:79 GettextFormat.php:576 GettextFormat.php:494 GettextFormat.php:403 SimpleFormat.php:220 ExportTranslationsSpecial... (5 lines skipped)
[01-Aug-2023 05:45:03 UTC] PHP Fatal error:  Maximum execution time of 9 seconds exceeded in LoadBalancer.php:974 LoadBalancer.php:848 LoadBalancer.php:840 DBConnRef.php:103 DBConnRef.php:117 DBConnRef.php:346 RevisionStore.php:2191 RevisionStore.php:2061 Utilities.php:113 Utilities.php:79 GettextFormat.php:576 GettextFormat.php:494 GettextFormat.php:403 SimpleFormat.php:220 ExportTranslationsSpec... (11 lines skipped)
[31-Jul-2023 15:40:56 UTC] PHP Fatal error:  Maximum execution time of 9 seconds exceeded in SMW_DI_Property.php:98 SMW_ExportController.php:355 SMW_ExportController.php:244 SMW_ExportController.php:441 SMW_SpecialOWLExport.php:198 SMW_SpecialOWLExport.php:52 SpecialPage.php:701 SpecialPageFactory.php:1564 MediaWiki.php:344 MediaWiki.php:948 MediaWiki.php:597 index.php:50 index.php:46 /srv/mediawi... (5 lines skipped)

Event Timeline

abi_ set the point value for this task to 4.Aug 1 2023, 8:57 AM

I was investigating one case, Special:ExportTranslations. It looks like someone is exporting all groups regularly via the Action API in po format for language ca (see also T299493: Provide regular dumps of translations from translatewiki.net).

I looked at profiling to see if there are any obvious regression or low hanging fruit. I observed following:

  • In MessageCollection::loadTranslations, we recreate $titleConds on subsequent calls even though it's not needed. Impact of this is minimal.
  • In MessageCollection::load* the $keys parameter is essentially unused and should be removed to clarify the code
  • GettextFormat::formatDocumentation loads message documention invididually for each message. This makes this format very slow. Documentation loading should be batch loaded via MessageCollection (or removed, but then off-line translators would not see message documentation, which is bad).

image.png (421×1 px, 98 KB)

Change 949996 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[translatewiki@master] Enable MySQL slow query logging

https://gerrit.wikimedia.org/r/949996

Change 954894 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] GettextFormat: Use MessageCollection to build message documentation

https://gerrit.wikimedia.org/r/954894

Change 954894 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] GettextFormat: Use MessageCollection to build message documentation

https://gerrit.wikimedia.org/r/954894

Abijeet's patch fixed the timeouts for message group exports.

Additionally, the page https://translatewiki.net/wiki/ISO_15924 was timing out. I made a bunch of simplifications to that page and the templates it uses. It might have broken some functionality, but it was hard to say from the hard to read undocumented code. In any case, the page is probably more useful if it loads at all.

We also had ElasticSearch timing out. While separate from the PHP time outs, it seems to behave better now after restarting.

With these fixes, we haven't observed any more recurring timeouts.

I profiled the new timeout that happened today:

PHP Fatal error:  Maximum execution time of 9 seconds exceeded in FormOptions.php:189 FormOptions.php:418 TranslationStatsGraphOptions.php:51 TranslationStatsBase.php:32 TranslatePerLanguageStats.php:189 TranslatePerLanguageStats.php:77 TranslationStatsDataProvider.php:117 QueryTranslationStatsActionApi.php:35 ApiMain.php:1929 ApiMain.php:906 ApiMain.php:877 api.php:95 api.php:48 /srv/mediawiki/tags/2023-09-18_14:36:20/includes/Html/FormOptions.php on line 160

The request that caused this was:

/w/api.php?action=translationstats&format=json&count=users&days=365&scale=weeks&formatversion=2

So I created a small script to mimic what this request does:

twn:~$ cat debug.php 
$graphOpts = new MediaWiki\Extension\Translate\Statistics\TranslationStatsGraphOptions();
$graphOpts->bindArray( [ 'days' => 365, 'count' => 'users', 'scale' => 'weeks' ] );
$language = MediaWiki\MediaWikiServices::getInstance()->getLanguageFactory()->getLanguage( 'en' );
$output = MediaWiki\Extension\Translate\Services::getInstance()->getTranslationStatsDataProvider()->getGraphData( $graphOpts, $language );

I ran this with xdebug to profile:

twn:~$ bash xdebug.sh /srv/mediawiki/workdir/maintenance/eval.php < debug.php

This produced large profile file:

-rw-r--r--    1 nike     users    2.7G Sep 19 12:33 cachegrind.out.2022839

I copied this file to my computer to open it using kcachegrind:

developer@jadewokki:~$ scp twn:cachegrind.out.2022839 .
cachegrind.out.2022839     100% 2666MB  31.0MB/s   01:26

64% amount of time is spend in TranslatePerLanguageStats->formatTimestamp:

image.png (955×1 px, 115 KB)

Change 958912 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[mediawiki/extensions/Translate@master] Statistics: Optimize the weekly format with simple cache

https://gerrit.wikimedia.org/r/958912

Change 958912 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Statistics: Optimize the weekly format with simple cache

https://gerrit.wikimedia.org/r/958912

Nikerabbit claimed this task.
Nikerabbit set Final Story Points to 8.