Page MenuHomePhabricator

"Elastica: missing curl_init_pooled method" due to mwscript job running with PHP 5 on terbium
Closed, ResolvedPublicPRODUCTION ERROR

Description

Terbium is still on HHVM 3.6.5. Upgrade was probably delayed to let long running script finish their job. I am unsure of how we can find a window to upgrade this host, but it would make sense to align it.

Currently, a number of WARNING are raised by CirrusSearch which expect HHVM >= 3.9.0, while not blocking, this is noise...

Update:

HHVM on terbium has been upgraded to 3.12.1. But it has a bunch of maintenance script runs by people or via a cron.

Some jobs refers to php5 which is the Zend flavor, that is the case for the updateSuggestedIndex.php script and that triggers the error message.

There was an ongoing extensions/CirrusSearch/maintenance/updateSuggesterIndex.php job that is run via mwscript and that script hardcodes php5:

if [ "`whoami`" != "$MEDIAWIKI_WEB_USER" ] ; then
	sudo -u "$MEDIAWIKI_WEB_USER" php5 "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"
else
	php5 "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"
fi

From Logstash the messages are currently emitted for uzwiki and uzwikitionary. And on terbium there are extensions/CirrusSearch/maintenance/updateSuggesterIndex.php runs under php5 for both wikis.

mwscript got hardcoded to Zend via 8f8e7dbdd834066504e59edfc4881bb98f76072a / https://gerrit.wikimedia.org/r/#/c/267816/

Was due to some code not being compatible with HHVM a couple months ago that prevent deployment.

Event Timeline

I don't see why upgrading hhvm would impact running processes. The repo schema is going to be changed, so there is not really any possibility that different versions running at the same time interact in any poisonous way.

Joe triaged this task as Medium priority.
hashar reassigned this task from Joe to Gehel.
hashar subscribed.

From T133522, HHVM hasn't been restarted and is stuck to some old version. Seems @Gehel is going to take care of it

Terbium was still running HHVM < 3.9.0 until recently (T132751). A quick grep on fluorine (grep curl_init /a/mw-log/CirrusSearch.log | grep -v terbium) indicates that the only error recently seen were coming from terbium. It seems that HHVM has not been restarted.

hashar renamed this task from upgrade HHVM to 3.12.1 on terbium to upgrade HHVM to 3.12.1 on terbium (Elastica: missing curl_init_pooled method.).Apr 26 2016, 7:57 AM

Mentioned in SAL [2016-04-26T18:41:28Z] <gehel> restarting HHVM on terbium to enable upgrade to 3.12.1 (T132751)

I can still see the same error message in the logs. Need to investigate...

HHVM is upgraded to 3.12.1:

gehel@terbium:~$ hhvm --version
HipHop VM 3.12.1 (rel)
Compiler: 3.12.1+dfsg-1
Repo schema: 8aa242a1c26f122330b9188732cdd84948fd7706

Curl pools are configured:

gehel@terbium:~$ ack-grep curl /etc/hhvm/
/etc/hhvm/server.ini
1:curl.namedPools = cirrus-eqiad,cirrus-codfw
2:curl.namedPools.cirrus-codfw.size = 20
3:curl.namedPools.cirrus-eqiad.size = 20

/etc/hhvm/fcgi.ini
1:curl.namedPools = cirrus-eqiad,cirrus-codfw

/etc/hhvm/php.ini
1:curl.namedPools = cirrus-eqiad,cirrus-codfw

HHVM service restarted on April 26:

gehel@terbium:~$ ps axu | grep hhvm
gehel     2574  0.0  0.0  11864   920 pts/4    S+   09:22   0:00 grep hhvm
www-data 19096  0.0  0.2 1687868 92984 ?       Ssl  Apr26   0:14 /usr/bin/hhvm --config /etc/hhvm/server.ini --mode server
www-data 19123  0.0  0.1 367920 44176 ?        Ss   Apr26   0:00 /usr/bin/hhvm --config /etc/hhvm/server.ini --mode server
www-data 19126  0.0  0.1 367920 44176 ?        Ss   Apr26   0:00 /usr/bin/hhvm --config /etc/hhvm/server.ini --mode server
www-data 19127  0.0  0.1 367920 44176 ?        Ss   Apr26   0:00 /usr/bin/hhvm --config /etc/hhvm/server.ini --mode server
www-data 19129  0.0  0.1 367920 44176 ?        Ss   Apr26   0:00 /usr/bin/hhvm --config /etc/hhvm/server.ini --mode server
www-data 19131  0.0  0.1 367920 44176 ?        Ss   Apr26   0:00 /usr/bin/hhvm --config /etc/hhvm/server.ini --mode server

Error still seen in logs:

gehel@fluorine:~$ grep curl_init /a/mw-log/CirrusSearch.log | tail -n 3
2016-04-28 09:24:32 [1ee2bade5eb3b2e4697f1c27] terbium srwiktionary 1.27.0-wmf.22 CirrusSearch WARNING: Elastica connection pool cannot init: missing curl_init_pooled method. Are you using hhvm >= 3.9.0?  
2016-04-28 09:24:32 [ac5a3305bbef434f248b895a] terbium shwiktionary 1.27.0-wmf.22 CirrusSearch WARNING: Elastica connection pool cannot init: missing curl_init_pooled method. Are you using hhvm >= 3.9.0?  
2016-04-28 09:24:32 [33c7aa9a693c869fdf11a59b] terbium srwiki 1.27.0-wmf.21 CirrusSearch WARNING: Elastica connection pool cannot init: missing curl_init_pooled method. Are you using hhvm >= 3.9.0?

But only seen for terbium:

gehel@fluorine:~$ grep curl_init /a/mw-log/CirrusSearch.log | grep -v terbium

There is probably another reason than HHVM version for not having curl_init_pooled available. But I need some help in discovering it... @EBernhardson, could you have a look?

terbium has a bunch of maintenance script runs by people or via a cron.

Some refers to /usr/bin/php which is hhvm. If they got started before the HHVM version upgrade, I guess they are still running the old hhvm version.

Other jobs refers to php5 which is the Zend flavor, that is the case for the updateSuggestedIndex.php script and maybe the error messages are triggered when the job runs under Zend.

There is an ongoing extensions/CirrusSearch/maintenance/updateSuggesterIndex.php job that is run via mwscript and that script hardcodes php5:

if [ "`whoami`" != "$MEDIAWIKI_WEB_USER" ] ; then
	sudo -u "$MEDIAWIKI_WEB_USER" php5 "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"
else
	php5 "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"
fi

From Logstash the messages are currently emitted for uzwiki and uzwikitionary. And on terbium there are extensions/CirrusSearch/maintenance/updateSuggesterIndex.php runs under php5 for both wikis.

mwscript got hardcoded to Zend via 8f8e7dbdd834066504e59edfc4881bb98f76072a / https://gerrit.wikimedia.org/r/#/c/267816/

Was due to some code not being compatible with HHVM a couple months ago that prevent deployment.

Gehel removed Gehel as the assignee of this task.Apr 28 2016, 11:17 AM
Gehel added a subscriber: dcausse.

Up for grab, this is probably something @EBernhardson or @dcausse can have a look into.

I'm unsure about changing everyone's cli scripts to hhvm. Mostly I'm not sure if there will be any fallout, but seems like something that has to happen some day.

As a stop gap, i could change our cron job to call hhvm & MWScript.php directly insteadof using the mwscript wrapper, but that means adding some small bit of duplication and chance of breaking if mwscript needs to change in the future.

As I see it there isn't an actual issue. We log warnings, but we still get the job done. We could just filter out those logs for the moment, this is asynchronous work, so no need to optimize for milliseconds (the only real issue is that we generate a lot of noise). We still need to migrate all script to HHVM at some point (just for coherence), but that's a different task.

Not sure how our logging framework works and if we can disable those logs just on terbium...

hashar renamed this task from upgrade HHVM to 3.12.1 on terbium (Elastica: missing curl_init_pooled method.) to "Elastica: missing curl_init_pooled method" due to mwscript job running with PHP 5 on terbium.Apr 28 2016, 8:08 PM
hashar updated the task description. (Show Details)

I have clarified the task description, it is no more about upgrading HHVM on terbium but more about switching mwscript to HHVM.

Change 286485 had a related patch set uploaded (by EBernhardson):
cirrus: Only use curl pools on hhvm

https://gerrit.wikimedia.org/r/286485

Change 286485 merged by jenkins-bot:
cirrus: Only use curl pools on hhvm

https://gerrit.wikimedia.org/r/286485

The last event in logstash is at 2016-05-03T10:15:02.000Z and came from terbium which is apparently when https://gerrit.wikimedia.org/r/#/c/286485/ got proposed in Gerrit though it did not get deployed. That is a mystery to me :-}

Anyway the log spam no more shows up \O/

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM