Timeout in testApiMethods calling mediawiki_messages
Open, LowPublic
Actions

Assigned To

None

Authored By

	jayvdb
	Sep 4 2015, 1:23 AM

Description

Two failures in a row, both on ar.wiktionary

https://travis-ci.org/wikimedia/pywikibot-core/jobs/78662177#L4606
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78674956#L4609

The next build passed, so the problem may be heavy load.

Details

	Subject	Repo	Branch	Lines +/-
	Deprecate fetching all mediawiki_messages using *	pywikibot/core	master	+37 -12

Customize query in gerrit

Related Objects

Mentioned In: rPWBC2a08cca9f106: [IMPR] Provide mediawiki_messages for foreign language codes
rPWBC21a67b2fbdb2: Deprecate fetching all mediawiki_messages using *
Mentioned Here: rMWbdb17a79a4bc: Only use FastStringSearch on PHP <5.5
T101418: Replace FastStringSearch

Event Timeline

jayvdb created this task.Sep 4 2015, 1:23 AM

jayvdb raised the priority of this task from to Needs Triage.

jayvdb updated the task description. (Show Details)

jayvdb added projects: Pywikibot-tests, Pywikibot.

jayvdb subscribed.

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptSep 4 2015, 1:23 AM

Another one, also on ar.wiktionary.
https://travis-ci.org/jayvdb/pywikibot-core/jobs/78687606#L4632

The test is calling mysite.mediawiki_messages('*')

which invokes https://ar.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=ar

which is not the nicest query to run, fetching 5.55M, and taking around 50s to one minute for me.

three more:
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78754749#L3484
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78729055#L3484
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78721079#L4605

The slowness of this query seems to be specific to Arabic Wiktionary.

I guess it could be a site specific problem? e.g. lots of custom messages in the MediaWiki: namespace?

IMO pywikibot-core could avoid this problem by rewriting the unit test so it doesnt test '*', and even deprecate '*' as a valid way to download all messages. A caller should always specify what messages they need.

Even https://ar.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=ar&amnocontent=1 takes 30s
whereas https://en.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=ar&amnocontent=1 takes 7s
and https://en.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=en&amnocontent=1 takes 5 s

Change 236245 had a related patch set uploaded (by John Vandenberg):
Deprecate fetching all mediawiki_messages using *

https://gerrit.wikimedia.org/r/236245

gerritbot added a project: Patch-For-Review.Sep 5 2015, 3:34 AM

Change 236245 merged by jenkins-bot:
Deprecate fetching all mediawiki_messages using *

https://gerrit.wikimedia.org/r/236245

jayvdb mentioned this in rPWBC21a67b2fbdb2: Deprecate fetching all mediawiki_messages using *.Sep 6 2015, 12:05 PM

jayvdb triaged this task as Low priority.Sep 7 2015, 9:31 AM

jayvdb removed a project: Patch-For-Review.

jayvdb set Security to None.

After digging deep into this issue, the slowness for Arabic-language wikis compared to other languages is because LanguageAr::normalize() is relatively slow, multiplied by almost 100000 calls.

Digging a little deeper, on PHP 5.5.9-1ubuntu4.11 using FastStringSearch instead of strtr for ReplacementArray is a fair bit faster for the replacement array from serialized/normalize-ar.ser. I can't compare the speeds in HHVM due to T101418. rMWbdb17a79a4bc landing shortly before this task was opened is almost certainly the immediate cause.

@ori: This sounds like something you'd want to look into. My most-reduced test cases are:

$data = unserialize( file_get_contents( "/srv/mediawiki/php-1.26wmf21/serialized/normalize-ar.ser" ) );
$fss = fss_prep_replace( $data );
for ( $i = 0; $i < 1000; $i++ ) {
    fss_exec_replace( $fss, "foo" );
}

versus

$data = unserialize( file_get_contents( "/srv/mediawiki/php-1.26wmf21/serialized/normalize-ar.ser" ) );
for ( $i = 0; $i < 1000; $i++ ) {
    strtr( "foo", $data );
}

The former takes about 0.034 seconds while the latter takes 0.422 seconds when run with time php5 on mw1017. Increasing the number of iterations to 100000 (which is about where the API query here is at), FSS goes to 0.074s while strtr jumps to over 20s. HHVM's behavior with strtr is in line with Zend PHP's.

The advantage for FSS seems mainly to be due to the ability to do the fss_prep_replace() once where strtr (presumably) has to do the equivalent for every iteration. Putting that inside the loop brings the FSS version up to around 18s for 100000 iterations.

Anomie moved this task from Unsorted to Non-core-API stuff on the MediaWiki-Action-API board.Sep 8 2015, 3:30 PM

Krenair subscribed.Sep 8 2015, 3:50 PM

Xqt mentioned this in rPWBC2a08cca9f106: [IMPR] Provide mediawiki_messages for foreign language codes.Feb 15 2020, 4:53 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:02 PM

Timeout in testApiMethods calling mediawiki_messagesOpen, LowPublicActions

Description

Details

Related Objects

Event Timeline

Timeout in testApiMethods calling mediawiki_messages
Open, LowPublic
Actions