Page MenuHomePhabricator

PHP/HHVM serialization incompatibility in some situations when using Serializable
Closed, ResolvedPublic

Description

There are three ways serialization works in PHP:

  1. Basic serialization just dumps/restores all the properties of an object.
  2. Using __sleep() and __wakeup(), you can exclude some properties or do other code modifications.
  3. Using the Serializable interface lets you serialize to/from any arbitrary string. Often this string is created by calling serialize() recursively.

So what happens if you have an array with two objects, each of which contains the same third object?

class Foo {
    public $obj;

    public function __construct( $obj ) {
        $this->obj = $obj;
    }
}

$obj = new stdclass;

$foo1 = new Foo( $obj );
$foo2 = new Foo( $obj );

$s = serialize( array( $foo1, $foo2 ) );

$data = unserialize( $s );

With either #1 or #2, it serializes that reference to $obj as a reference (code 'r') so when things are unserialized $data[0]->obj === $data[1]->obj.

without Serializable
a:2:{i:0;O:3:"Foo":1:{s:3:"obj";O:8:"stdClass":0:{}}i:1;O:3:"Foo":1:{s:3:"obj";r:3;}}

Originally, with method #3, it broke the reference so $data[0]->obj and $data[1]->obj would be clones of each other but would not be ===.

Serializable without recursive state
a:2:{i:0;C:3:"Foo":35:{a:1:{s:3:"obj";O:8:"stdClass":0:{}}}i:1;C:3:"Foo":35:{a:1:{s:3:"obj";O:8:"stdClass":0:{}}}}

It seems that was reported as PHP bug 36424. The issue was eventually fixed in PHP 5.4.0 by making recursive calls to serialize() and unserialize() share state.

Serializable with recursive state
a:2:{i:0;C:3:"Foo":35:{a:1:{s:3:"obj";O:8:"stdClass":0:{}}}i:1;C:3:"Foo":20:{a:1:{s:3:"obj";r:4;}}}

Unfortunately it seems HHVM never picked up that fix (HHVM bug 7253).

In T209582: Unable to serialize r:46: Id 46 out of range. and T210499: ErrorException from line 1159 of /srv/mediawiki/php-1.33.0-wmf.4/includes/Message.php: PHP Warning: Invalid argument supplied for foreach(), this seems to have caused issues with the cache of the sidebar from Skin::buildSidebar(). When someone testing with PHP 7.2 populates the cache, the serialized string uses a reference for Title objects inside of Message objects. When that serialized string is later fetched with HHVM, the lack of recursive state makes the unserialization fail.

Event Timeline

One general fix would be to abandon Serializable entirely. A trait like this should give basically the same functionality without the bug:

trait WorkingSerializable {
    protected $serializeData = null;

    /**
     * Return serialization data
     * @warning Don't call PHP's `serialize()` from this method!
     * @return array
     */
    abstract protected function serialize();

    /**
     * Restore state from serialization data
     * @param array $data
     */
    abstract protected function unserialize( array $data );

    public function __sleep() {
        $this->serializeData = $this->serialize();
        return [ 'serializeData' ];
    }

    public function __wakeup() {
        $this->unserialize( $this->serializeData );
        $this->serializeData = null;
    }
}

Another possible fix, that might also bring some benefit towards T161647: RFC: Deprecate using php serialization inside MediaWiki, would be to write our own serializer that will only serialize/unserialize bare stdClass and objects implementing an interface with methods like the abstract methods on that WorkingSerializable trait.

OTOH, retrofitting that onto everything that currently uses serialize() would be quite a bit of work.

One general fix would be to abandon Serializable entirely.

There are few cases of using Serializable in libraries (though it looked as if most/all were under our control): https://codesearch.wmflabs.org/deployed/?q=Serializable&i=nope&files=&repos=

I'm not sure if we should invest that much effort for HHVM compat fixes (mostly updating everything to use WorkingSerialize until we drop HHVM support, and then reverting it all), we could also:

  • Add some extra suffix/prefix for PHP 7 cache keys so they don't use the same cache, which avoids any other case of caching/serialization formats between HHVM and PHP being incompatible.
  • Apply the PHP patch to our HHVM build (I have no idea how trivial or complicated that is)

Change 476280 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Avoid putting Message objects in sidebar cache

https://gerrit.wikimedia.org/r/476280

Change 476287 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Message: Don't include Title objects in the serialization (part 1)

https://gerrit.wikimedia.org/r/476287

Change 476288 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Message: Don't include Title objects in the serialization (part 2)

https://gerrit.wikimedia.org/r/476288

Change 476289 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Message: Throw if given invalid serialized data

https://gerrit.wikimedia.org/r/476289

Change 476287 merged by jenkins-bot:
[mediawiki/core@master] Message: Don't include Title objects in the serialization (part 1)

https://gerrit.wikimedia.org/r/476287

Change 476289 merged by jenkins-bot:
[mediawiki/core@master] Message: Throw if given invalid serialized data

https://gerrit.wikimedia.org/r/476289

Change 476280 merged by jenkins-bot:
[operations/mediawiki-config@master] Avoid putting Message objects in sidebar cache

https://gerrit.wikimedia.org/r/476280

Mentioned in SAL (#wikimedia-operations) [2018-12-03T18:42:21Z] <anomie@deploy1001> Synchronized wmf-config/CommonSettings.php: Updating SkinBuildSidebar hook function for T210528 (duration: 00m 47s)

Is this still unresolved? If so, it should be marked as a blocker to the php7 transition. @Anomie do you think there is anything else left to do?

We've worked around it for the sidebar and for Message objects' ->title attribute generally, and added a check so Message will throw an exception rather than continue execution with broken state if it occurs in some other part of the serialized state. But it's possible a similar situation could occur with other classes implementing Serializable if their serialized data contains references.

The decision above seems to be that we'll know that log messages like "Unable to serialize r:46: Id 46 out of range." signal that this issue is occurring and deal with it if it actually happens, rather than trying to preemptively find and replace all uses of recursive serialization in the Serializable interface.

I suppose if someone wanted to be a little preemptive they could go through those other classes to add checks like rMWe2f2da74f2fc: Message: Throw if given invalid serialized data or otherwise verify that things won't break if the recursive unserialize() call returns false.

Change 476288 merged by jenkins-bot:
[mediawiki/core@master] Message: Don't include Title objects in the serialization (part 2)

https://gerrit.wikimedia.org/r/476288

Current status: The known problems with the Message class have been worked around in various ways. It's possible that other classes using Serializable might still run into the same problem. Watch for log messages like "Unable to serialize r:46: Id 46 out of range.".

I'm marking this with Platform Team Workboards (Done with CPT) for now, although it could move back if we find other things hitting this issue. I don't know whether we should resolve this now (and reopen it if other things hit it, or track "mitigate T210528 in X" as new tasks), or if we should just leave this open until HHVM is gone.

CCicalese_WMF claimed this task.