Two failures in a row, both on ar.wiktionary
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78662177#L4606
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78674956#L4609
The next build passed, so the problem may be heavy load.
Two failures in a row, both on ar.wiktionary
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78662177#L4606
https://travis-ci.org/wikimedia/pywikibot-core/jobs/78674956#L4609
The next build passed, so the problem may be heavy load.
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Deprecate fetching all mediawiki_messages using * | pywikibot/core | master | +37 -12 |
Another one, also on ar.wiktionary.
https://travis-ci.org/jayvdb/pywikibot-core/jobs/78687606#L4632
The test is calling mysite.mediawiki_messages('*')
which invokes https://ar.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=ar
which is not the nicest query to run, fetching 5.55M, and taking around 50s to one minute for me.
The slowness of this query seems to be specific to Arabic Wiktionary.
I guess it could be a site specific problem? e.g. lots of custom messages in the MediaWiki: namespace?
IMO pywikibot-core could avoid this problem by rewriting the unit test so it doesnt test '*', and even deprecate '*' as a valid way to download all messages. A caller should always specify what messages they need.
Even https://ar.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=ar&amnocontent=1 takes 30s
whereas https://en.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=ar&amnocontent=1 takes 7s
and https://en.wiktionary.org/w/api.php?action=query&meta=allmessages&ammessages=*&amlang=en&amnocontent=1 takes 5 s
Change 236245 had a related patch set uploaded (by John Vandenberg):
Deprecate fetching all mediawiki_messages using *
Change 236245 merged by jenkins-bot:
Deprecate fetching all mediawiki_messages using *
After digging deep into this issue, the slowness for Arabic-language wikis compared to other languages is because LanguageAr::normalize() is relatively slow, multiplied by almost 100000 calls.
Digging a little deeper, on PHP 5.5.9-1ubuntu4.11 using FastStringSearch instead of strtr for ReplacementArray is a fair bit faster for the replacement array from serialized/normalize-ar.ser. I can't compare the speeds in HHVM due to T101418. rMWbdb17a79a4bc landing shortly before this task was opened is almost certainly the immediate cause.
@ori: This sounds like something you'd want to look into. My most-reduced test cases are:
$data = unserialize( file_get_contents( "/srv/mediawiki/php-1.26wmf21/serialized/normalize-ar.ser" ) ); $fss = fss_prep_replace( $data ); for ( $i = 0; $i < 1000; $i++ ) { fss_exec_replace( $fss, "foo" ); }
versus
$data = unserialize( file_get_contents( "/srv/mediawiki/php-1.26wmf21/serialized/normalize-ar.ser" ) ); for ( $i = 0; $i < 1000; $i++ ) { strtr( "foo", $data ); }
The former takes about 0.034 seconds while the latter takes 0.422 seconds when run with time php5 on mw1017. Increasing the number of iterations to 100000 (which is about where the API query here is at), FSS goes to 0.074s while strtr jumps to over 20s. HHVM's behavior with strtr is in line with Zend PHP's.
The advantage for FSS seems mainly to be due to the ability to do the fss_prep_replace() once where strtr (presumably) has to do the equivalent for every iteration. Putting that inside the loop brings the FSS version up to around 18s for 100000 iterations.