Page MenuHomePhabricator

Fatal error: "Object does not support method calls" (from MemcachedPeclBagOStuff)
Closed, DeclinedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: 1.35.0-wmf.5

message
Fatal Error: Object does not support method calls
from /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MemcachedPeclBagOStuff.php:320
exception.trace
#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
#1 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(434): MediumSpecificBagOStuff->getMulti(array)
#2 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(359): WANObjectCache->getMulti(array, array, array, integer)
#3 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(1330): WANObjectCache->get(string, integer, array, integer)
#4 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(1278): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array)
#5 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthUser.php(557): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#6 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthUser.php(413): CentralAuthUser->loadFromCache()
#7 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthUser.php(588): CentralAuthUser->loadState()
#8 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthUser.php(685): CentralAuthUser->getId()
#9 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthHooks.php(916): CentralAuthUser->exists()
#10 /srv/mediawiki/php-1.35.0-wmf.5/includes/Hooks.php(174): CentralAuthHooks::onGetUserBlock(User, string, NULL)
#11 /srv/mediawiki/php-1.35.0-wmf.5/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#12 /srv/mediawiki/php-1.35.0-wmf.5/includes/block/BlockManager.php(167): Hooks::run(string, array)
#13 /srv/mediawiki/php-1.35.0-wmf.5/includes/user/User.php(1802): MediaWiki\Block\BlockManager->getUserBlock(User, WebRequest, boolean)
#14 /srv/mediawiki/php-1.35.0-wmf.5/includes/user/User.php(2120): User->getBlockedStatus(boolean)
#15 /srv/mediawiki/php-1.35.0-wmf.5/includes/block/BlockManager.php(470): User->getBlock()
#16 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(751): MediaWiki\Block\BlockManager->trackBlockWithCookie(User, WebResponse)
#17 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(565): MediaWiki::preOutputCommit(DerivativeContext)
#18 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(510): ApiMain->executeActionWithErrorHandling()
#19 /srv/mediawiki/php-1.35.0-wmf.5/api.php(83): ApiMain->execute()
#20 /srv/mediawiki/w/api.php(3): require(string)
#21 {main}

Impact

Notes

The line of code in question:

MemcachedPeclBagOStuff.php
	protected function doGetMulti( array $keys, $flags = 0 ) {
		// …
		$result = $this->acquireSyncClient()->getMulti( $keys ) ?: [];
		//                                  ^^^^

Event Timeline

This is intermittent and affects various different unrelated URLs. Probably a race condition of some kind that affects all Memcached interaction, not anything specific to Search or API.

The line of code in question:

	protected function doGetMulti( array $keys, $flags = 0 ) {
		// …
		$result = $this->acquireSyncClient()->getMulti( $keys ) ?: [];
		//                                  ^^^^
Krinkle renamed this task from Object does not support method calls to Fatal error: "Object does not support method calls" (from MemcachedPeclBagOStuff).Dec 3 2019, 10:32 PM

(Following from IRC). Feel free to pass back to me after initial investigation if you have other work (I can patch it).

Looking at the version of PHP 7 we're running, this error shows up a few times. The condition for it is always the same:

zend_object *orig_obj = obj;

if (UNEXPECTED(obj->handlers->get_method == NULL)) {
	zend_throw_error(NULL, "Object does not support method calls");
	zval_ptr_dtor_nogc(free_op2);
	zval_ptr_dtor_nogc(free_op1);
	HANDLE_EXCEPTION();
}

This explains what get_method does: https://wiki.php.net/internals/engine/objects#get_method

May be NULL if the object is not to support method calls

This suggests that we're trying to call a method on an object that was instantiated in a way that it shouldn't support methods.

Looking at the offending line:

$result = $this->acquireSyncClient()->getMulti( $keys ) ?: [];

This would suggest that $this->acquireSyncClient() may return an object that doesn't have/support any methods.

But $this->syncClient is initialised in the MemcachedPeclBagOStuff constructor, so by that time that line is reached, it should be a PECL Memcached object.

The only difference related to construction I see before and after should not matter:

Before

$client = new Memcached( $connectionPoolId );

$client = new Memcached;

After

$syncClient = new Memcached( "$connectionPoolId-sync" );

$syncClient = new Memcached();

Maybe a bug in the PECL memcached library with that different format/length of connection pool id?

Back to you, @Krinkle...

Krinkle triaged this task as High priority.

Thanks.

The pool-ID is a red-herring, I think, because they are specific to the options['persistent'] conditionals, which we don't enable in production. (We use mcrouter instead for connection pooling, more or less.) I actually wonder if this should still be supported in core given it can have a lot of nasty side-effects that I don't think we've ever seriously tested for. But alas, that's a matter for another day.

Some online searching got me hooked on the idea that this might be due to use of magic call methods e.g. where a class can implement __call, which if it doesn't handle anything or if it is disabled by some means, could cause Fatal Error: Object does not support method calls.

But the best I can tell, the Memcached class as defined by php-memcached doesn't make use of any magic methods. And even if it did, it would only kick in after explicitly defined methods were considered. The method we use in our code, getMulti is indeed explicitly defined (and phan stub).

It isn't specific to Memcached->getMulti. Looking at Logstash, I also find instances of Memcached->set() failing the same way.

  • Request ID: XeqGeApAIDcAAGf1rhkAAADV
MemcachedPeclBagOStuff.php
	protected function doSet( $key, $value, $exptime = 0, $flags = 0 ) {
		// …

		$client = $this->acquireSyncClient();
		$result = $client->set( /* … */ );
exception
Fatal Error: Object does not support method calls

at /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MemcachedPeclBagOStuff.php:205
#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MediumSpecificBagOStuff.php(168): MemcachedPeclBagOStuff->doSet(string, string, integer, integer)
#1 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(2224): MediumSpecificBagOStuff->set(string, string, integer)
#2 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(751): WANObjectCache->relayPurge(string, integer, integer)
#3 /srv/mediawiki/php-1.35.0-wmf.5/includes/user/User.php(2604): WANObjectCache->delete(string)
#4 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/Database.php(3688): User->{closure}(Wikimedia\Rdbms\DatabaseMysqli)
#5 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1609): Wikimedia\Rdbms\Database->runOnTransactionPreCommitCallbacks()
…
#11 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(664): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string, array)
#12 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(565): MediaWiki::preOutputCommit(DerivativeContext)
…
#14 /srv/mediawiki/php-1.35.0-wmf.5/api.php

I think I've got something. Each of the request ID's was also involved in a different fatal error.

Example 1

This is the original one from task description.

  • Request ID Xd-ZGgpAAK4AAECAuEsAAAAB
  • Request URL: GET eu.wikipedia.org /w/api.php?action=opensearch&…
  • Host: mw1312
Exception A
caught_by: mwe_handler

WMFTimeoutException: the execution time limit of 60 seconds was exceeded

at /srv/mediawiki/wmf-config/set-time-limit.php:39
#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MemcachedPeclBagOStuff.php(320): {closure}(integer)
#1 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
…
#6 /srv/mediawiki/php-1.35.0-wmf.5/includes/cache/LinkCache.php(277): WANObjectCache->getWithSetCallback(string, integer, Closure)
…
#14 /srv/mediawiki/php-1.35.0-wmf.5/includes/parser/Parser.php(3625): Parser->fetchTemplateAndTitle(Title)
…
#55 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiQuery.php(255): TextExtracts\ApiQueryExtracts->execute()
…
#65 /srv/mediawiki/php-1.35.0-wmf.5/api.php
Exception B
caught_by: mwe_handler

Fatal Error: Object does not support method calls

at /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MemcachedPeclBagOStuff.php:320
#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
#1 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(434): MediumSpecificBagOStuff->getMulti(array)
…
#5 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthUser.php(557): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
…
#9 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthHooks.php(916): CentralAuthUser->exists()
#10 /srv/mediawiki/php-1.35.0-wmf.5/includes/Hooks.php(174): CentralAuthHooks::onGetUserBlock(User, string, NULL)
…
#15 /srv/mediawiki/php-1.35.0-wmf.5/includes/block/BlockManager.php(470): User->getBlock()
#16 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(751): MediaWiki\Block\BlockManager->trackBlockWithCookie(User, WebResponse)
#17 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(565): MediaWiki::preOutputCommit(DerivativeContext)
…
#20 /srv/mediawiki/w/api.php

Based on the trace we can see two things:

  1. They are not the same call to Memcached. This rules out any kind of race condition relating to both our regular code and the timeout handler, or timeout handler and PHP engine, both somehow trying to throw an exception. Only one of them threw during each call.
  2. The execution did not stop after the timeout error. This is because we throw it from Excimer (wmf-config/set-time-limit) as a regular exception which is still caught at the entry point. This is good because that means we give it a chance to render a "normal" error page, e.g. one with a site logo, and personalised sidebar and other navigation links. (Only falling back to a generic error page if this step also fails).
  3. The timeout (A) was thrown first because it has a "normal" call stack, whereas B has a pre-send call stack.

That still leaves the question as to why the timeout has put MemcachedPeclBagOStuff and/or Memcached into a state where it now throws Fatal Error: Object does not support method calls when trying to use it.

Example 2

  • Request ID: XefHOwpAEMYAABi5K1QAAANM
  • Request URL: GET de.wikipedia.org /w/api.php?action=parse&format=json&oldid=
  • Host: mw1317
Exception A
caught_by: mwe_handler

WMFTimeoutException: the execution time limit of 60 seconds was exceeded

at /srv/mediawiki/wmf-config/set-time-limit.php line 39
#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MemcachedPeclBagOStuff.php(320): {closure}(integer)
#1 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
…
#6 /srv/mediawiki/php-1.35.0-wmf.5/includes/filerepo/file/LocalFile.php(339): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
…
#8 /srv/mediawiki/php-1.35.0-wmf.5/includes/filerepo/FileRepo.php(462): LocalFile->load(integer)
…
#14 /srv/mediawiki/php-1.35.0-wmf.5/includes/parser/Parser.php(585): Parser->internalParse(string)
…
#28 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(1603): ApiParse->execute()
…

#31 /srv/mediawiki/php-1.35.0-wmf.5/api.php
Exception B
caught_by: mwe_handler

Fatal Error: Object does not support method calls

#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
#1 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/wancache/WANObjectCache.php(434): MediumSpecificBagOStuff->getMulti(array)
…
#5 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthUser.php(557): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
…
#9 /srv/mediawiki/php-1.35.0-wmf.5/extensions/CentralAuth/includes/CentralAuthHooks.php(916): CentralAuthUser->exists()
…
#15 /srv/mediawiki/php-1.35.0-wmf.5/includes/block/BlockManager.php(470): User->getBlock()
#16 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(751): MediaWiki\Block\BlockManager->trackBlockWithCookie(User, WebResponse)
#17 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(565): MediaWiki::preOutputCommit(DerivativeContext)
#18 /srv/mediawiki/php-1.35.0-wmf.5/includes/api/ApiMain.php(510): ApiMain->executeActionWithErrorHandling()
#19 /srv/mediawiki/php-1.35.0-wmf.5/api.php

On the other hand, there are also several instances of this Fatal Error that do not have other exceptions in Logstash under their request ID. For example XeqGeApAIDcAAGf1rhkAAADV from my previous comment only failed with Fatal error: Object does not support method calls. No indication of a cascading error there.

@tstarling I could use your help to confirm or reject my hypothesis.

My running theory is that the interrupt opportunity used by our php-excimer extension is not cleanly occurring between two PHP statements, nor cleanly between two parts of a fluent call chain in PHP. Rather, it might be in the middle of the C code that backs Memcached::getMulti (source) of php-memcached. The idea being that it's doing something that leaves the PHP reflection of that object in a "very bad" state without opportunity to recover/restore it.

Where "very bad" state would need to involve it causing the following in Zend/zend_vm_def.h@7.2.24 to return true:

zend_vm_def.h
if (UNEXPECTED(obj->handlers->get_method == NULL)) {
	zend_throw_error(NULL, "Object does not support method calls");
	FREE_OP2();
	FREE_OP1();
	HANDLE_EXCEPTION();
}

I've not been able to confirm this theory though as I don't see a direct link between this code and what php-memcached does. Having said that, the internals of Zend and php-memcached are very alien to me, so I may be missing it.

(Tagging CPT for process reasons. I've reached a dead-end in my investigation with no new leads or unexplored branches in sight.)

@Krinkle do you mean that every time these fatals occured, the request had an excimer sample happen?

@Krinkle do you mean that every time these fatals occured, the request had an excimer sample happen?

Not exactly. I think most requests will have a sample happen during them, but that's not what I was referring to.

I was referring to the other use case of php-excimer, which is execution timeouts. When the process has been running for more than N seconds, php-excimer is configured to schedule a VM interrupt at the next statement, and run the wmf-config/set-time-limit.php callback.

I observed that a large number of the MemcachedPeclBagOStuff fatals were post-send fatals, where the same web request had its regular callstack aborted by an execution timeout (fatal WMFTimeoutException). The WMFTimeoutException is then caught high up in MediaWiki, logged, and then we try to render an error page and do some basic pre-send/post-send stuff (best effort).

My running theory is that the interrupt opportunity used by our php-excimer extension is not cleanly occurring between two PHP statements, nor cleanly between two parts of a fluent call chain in PHP.

An excimer timer event has two stages. In response to the underlying hardware event, excimer_timer_handle() is called in a separate thread. It uses mutexes to safely (we hope) find the relevant object and set a flag in it, and to set a flag in the PHP VM. The second stage is when the PHP VM calls excimer_timer_interrupt(), and this is the stage that executes userspace code and thus raises an exception.

I wonder what is this {closure}(integer) in the timeout call stack:

#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/objectcache/MemcachedPeclBagOStuff.php(320): {closure}(integer)

It seems like memcached is calling back into PHP, and that PHP code hits the timeout and throws an exception. PHP exceptions do not automatically unwind the C stack, so after running PHP code, a C function has to check the exception flag and return early if necessary. So maybe memcached screws up this early return and leaves the object in an inconsistent state. The main problem with this theory is that, although memcached has a few ways to call back into PHP, none of them seem to be used in MemcachedPeclBagOStuff.

Are those all happening on requests that are timing out (and killed as a result)?

If some of these fatals happen when timeouts aren't involved, then I think it's an entirely different diagnosis and a more general problem than just timeouts.

Based on my very limited understanding of this, it gives me the feeling that if it happens outside of timeout events, it could have to do with the sampling/interrupt itself. Excimer might tamper with the Memcached object if it encounters one and leave it damaged for the rest of the PHP execution. During mere sampling that might be pretty rare, as the sample has to happen at a point in time when a Memcached object is involved. In timeout scenarios, however, we are more likely to have a post-send callback and those tend to almost always write to our data stores, Memcached included.

Are those all happening on requests that are timing out (and killed as a result)?

If some of these fatals happen when timeouts aren't involved, then I think it's an entirely different diagnosis and a more general problem than just timeouts.

On mwlog1001 I found 41 instances of "does not support method calls". I made a list of all relevant request IDs and filtered the log for those IDs. Only 13 had an associated timeout logged, the others did not have any additional log entry in exception-json.log. The 54 log lines are in mwlog1001:~tstarling/T239724-related-lines .

Another data point: This same PHP fatal through Memcached can reached multiple times within a single request.

From Tim's data dump:

archive/exception-json.log-20191029.gz:2019-10-29 03:08:57 [XbetDQpAEMYAABy1FIQAAAAM] mw1317 arwiki 1.35.0-wmf.2 exception-json ERROR: {"id":"XbetDQpAEMYAABy1FIQAAAAM" …

And from Logstash:

2019-10-29T03:08:57	exception	mw1317	arwiki	[XbetDQpAEMYAABy1FIQAAAAM] Error: Object does not support method calls
#0 /srv/mediawiki/php-1.35.0-wmf.2/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
#1 /srv/mediawiki/php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php(434): MediumSpecificBagOStuff->getMulti(array)
…
#7 /srv/mediawiki/php-1.35.0-wmf.2/includes/parser/Parser.php(3748): Revision::newKnownCurrent(Wikimedia\Rdbms\MaintainableDBConnRef, Title)
#8 /srv/mediawiki/php-1.35.0-wmf.2/includes/parser/Parser.php(3720): Parser->statelessFetchRevision(Title, Parser) …
#34 /srv/mediawiki/php-1.35.0-wmf.2/includes/content/WikitextContent.php(368): Parser->parse …
#43 /srv/mediawiki/php-1.35.0-wmf.2/includes/api/ApiQuery.php(255): TextExtracts\ApiQueryExtracts->execute …
#47 /srv/mediawiki/php-1.35.0-wmf.2/api.php

2019-10-29T03:08:57	exception	mw1317	arwiki	[XbetDQpAEMYAABy1FIQAAAAM] Error: Object does not support method calls

#0 /srv/mediawiki/php-1.35.0-wmf.2/includes/libs/objectcache/MediumSpecificBagOStuff.php(557): MemcachedPeclBagOStuff->doGetMulti(array, integer)
#1 /srv/mediawiki/php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php(434): MediumSpecificBagOStuff->getMulti(array)
#2 /srv/mediawiki/php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php(359): WANObjectCache->getMulti(array, array, array, integer)
…
#9 /srv/mediawiki/php-1.35.0-wmf.2/extensions/CentralAuth/includes/CentralAuthHooks.php(913): CentralAuthUser->exists()
#10 /srv/mediawiki/php-1.35.0-wmf.2/includes/Hooks.php(174): CentralAuthHooks::onGetUserBlock(User, string, NULL)
…
#15 /srv/mediawiki/php-1.35.0-wmf.2/includes/block/BlockManager.php(473): User->getBlock()
#16 /srv/mediawiki/php-1.35.0-wmf.2/includes/MediaWiki.php(751): MediaWiki\Block\BlockManager->trackBlockWithCookie(User, WebResponse)
#17 /srv/mediawiki/php-1.35.0-wmf.2/includes/api/ApiMain.php(564): MediaWiki::preOutputCommit(DerivativeContext)
#20 /srv/mediawiki/w/api.php

We know (per previous comments) that it's expected for execution to pick up after a fatal to a limited extent when rendering the error page. This could be a coincidence if it's all based on random chance and race conditions. However, it's possible that once the race condition is reached, that the Memcached object is indefinitely in a "bad" state and that it's merely rare for us to consult it again after we've reached a fatal (after all, it is meant to be fatal...).

This might help narrow it down a bit, focussing more on how the object might get into this bad state within Zend, and focussing less on how this condition may be triggered by the call site (which I'm hypothesising for now isn't the case).

Change 558589 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] objectcache: Add corruption debugger to MemcachedPeclBagOStuff::doGetMulti

https://gerrit.wikimedia.org/r/558589

Change 558593 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.35.0-wmf.11] objectcache: Add corruption debugger to MemcachedPeclBagOStuff::doGetMulti

https://gerrit.wikimedia.org/r/558593

Change 558594 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.35.0-wmf.10] objectcache: Add corruption debugger to MemcachedPeclBagOStuff::doGetMulti

https://gerrit.wikimedia.org/r/558594

Change 558594 abandoned by Krinkle:
objectcache: Add corruption debugger to MemcachedPeclBagOStuff::doGetMulti

https://gerrit.wikimedia.org/r/558594

Change 558593 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.11] objectcache: Add corruption debug to MemcachedPeclBagOStuff::doGetMulti

https://gerrit.wikimedia.org/r/558593

No hits since 15 December, and no hits with my logging in place. Closing for now. It's possible that my logging is somehow making Zend no longer fail. If that is the case, it will start failing again after this week.

Change 558589 abandoned by Krinkle:
objectcache: Add corruption debugger to MemcachedPeclBagOStuff::doGetMulti

https://gerrit.wikimedia.org/r/558589

Krinkle reopened this task as Open.EditedJul 1 2020, 12:24 AM

And it's back, and most ones I found did not involve a timeout, which suggests that the Memc problem was indeed what made these user requests fail.

2020-06-13
[be654d2c-e16c-4413-a45c-6c808ea15302] /w/api.php?action=query&prop=revisions&revids=426120488&rvprop=content&rvparse=1&format=xml   Error from line 330 of /srv/mediawiki/php-1.35.0-wmf.36/includes/libs/objectcache/MemcachedPeclBagOStuff.php: Object does not support method calls
2020-05-25
[cfc65b3b-b5b8-45ba-b759-496972b736d3] /w/api.php?action=query&prop=revisions&revids=421624265&rvprop=content&rvparse=1&format=xml   Error from line 330 of /srv/mediawiki/php-1.35.0-wmf.32/includes/libs/objectcache/MemcachedPeclBagOStuff.php: Object does not support method calls
`
2020-05-01
[6f1493fb-4b05-481d-bdd2-2cf0d2687a6f] /w/api.php?action=query&prop=revisions&revids=416217063&rvprop=content&rvparse=1&format=xml   Error from line 330 of /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php: Object does not support method calls
2020-05-01
[1d8dfab1-bb13-4515-8f22-9aed0f508a8b] /w/api.php?action=query&prop=revisions&revids=954233374&rvprop=content&rvparse=1&format=xml   Error from line 330 of /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php: Object does not support method calls

These are all from different app servers, which means its unlikely to be due to T245183.

Krinkle lowered the priority of this task from High to Low.Jul 27 2020, 8:07 PM

Deprioritising in favour of T234455 and T245183.

There was another burst of these. This time on numerous differrent machines at the same time, namely the Codfw wpt parsoid servers:

Screenshot 2020-09-22 at 16.24.05.png (1×1 px, 138 KB)

The fact that this exact failure is reocurring several times over the past months, and that it happens simultaneously on multiple servers, I think is sufficient signal to strongly suggest that this is not a "random" php-opcache corruption where a call object is confused with another during a method call, which, so far have not generally happened twice the same way from what I can remember.

A few more samples:

  • timestamp: 2020-09-08T14:02:26
  • host: wtp2008
  • request: GET commons.wikimedia.org /w/rest.php/commons.wikimedia.org/v3/page/pagebundle/File%3A … .jpg/358754203
 Uncaught Error: Object does not support method calls in /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/MemcachedPeclBagOStuff.php:342

#0 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/MediumSpecificBagOStuff.php(563): MemcachedPeclBagOStuff->doGetMulti(Array, 0)
#1 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(489): MediumSpecificBagOStuff->getMulti(Array)
#2 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(407): WANObjectCache->getMulti(Array, Array, Array, -1)
#3 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(1431): WANObjectCache->get('commonswiki:pag...', -1, Array, -1)
#4 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache->fetchOrRegenerate('commonswiki:pag...', 7200, Object(Closure), Array, Array)
#5 /srv/mediawiki/php-1.36.0-wmf.6/extensions/Translate/tag/TranslatablePage.php(883): WANObjectCache->

  • timestamp: 2020-09-06T08:36:01
  • host: wtp2007
  • request: GET commons.wikimedia.org /w/rest.php/commons.wikimedia.org/v3/page/pagebundle/Commons%3ADeletion_requests%2FArchive%2F2007%2F11/23582549
 Uncaught Error: Object does not support method calls in /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/MemcachedPeclBagOStuff.php:342

#0 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/MediumSpecificBagOStuff.php(563): MemcachedPeclBagOStuff->doGetMulti(Array, 0)
#1 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(489): MediumSpecificBagOStuff->getMulti(Array)
#2 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(407): WANObjectCache->getMulti(Array, Array, Array, -1)
#3 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(1431): WANObjectCache->get('commonswiki:pag...', -1, Array, -1)
#4 /srv/mediawiki/php-1.36.0-wmf.6/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache->fetchOrRegenerate('commonswiki:pag...', 7200, Object(Closure), Array, Array)
#5 /srv/mediawiki/php-1.36.0-wmf.6/extensions/Translate/tag/TranslatablePage.php(883): WANObjectCache->

Also seen a few times in December. Closing for now in favour of T245183/T253673. If we see it again after that we can re-open based on logspam.