Problem statement
PHP unserialize() and serialize() can execute code when given malicious input. In most cases this serialization format is unnecessary. As a hardening measure against making a mistake that could result in remote code execution, we should avoid this format, even in cases where the serialized data is stored in a trusted data-store (such as the db).
Threats this rfc is intended to counter:
- A bug in MediaWiki allows a user to inject untrusted data into an unserialize call. Removing unserialize reduces the potential for mistakes.
- An attacker somehow obtains write access to either the database or memcache, and wants to extend his/her access to arbitrary code execution.
Proposed guideline
This RFC proposes the following:
- New code SHOULD use JSON instead of PHP serialization whenever possible for serializing data.
- Serialization of primitive values and key-value structures MUST never use PHP serialization.
- Any edge cases that require use of serialize or unserialize complicated classes, MUST protect the serialized blob with HMAC (e.g. keyed to $wgSecretKey) to protect against malicious modifications of the blob. This logic should be implemented in a class (e.g. MWSerializeWrapper) to avoid copy-pasted code all over the place
- Using unserialize is fine if the data never leaves the current process. In particular $clone = unserialize( serialize( $obj ) )
In addition to the new guideline for new code, this RFC proposes that we start to (slowly) convert existing uses of PHP serialization including old data in the db. Most likely by using JSON. The eventual goal being to remove all legacy uses of php unserialize()
Good first candidates for conversion:
- LocalisationCache
- MediaHandler metadata. This is particularly risky because the API will unserialize regardless of which MediaHandler class is in use.
Things still allowed under this RFC
- Using php serialization on data that we never ingest (unserialize) is fine. In particular the php serialization output format of the API is outside of the scope of this RFC.
- Using unserialize is fine if the data never leaves the current process. In particular using $clone = unserialize( serialize( $obj ) ) as a hack to create a deep clone is fine.
Unanswered questions
- How to deal with memcached. We could potentially use a custom memcache client - we already have a php implementation. Its unclear what sort of performance loss there would be compared to using the memcache php extension. We could also potentially modify the php memcache extension to do what we want. php Memcache also has a Memcached::SERIALIZER_JSON which is perhaps what we are looking for. More investigation is needed
- Redis is similar to memcached. There is a SERIALIZER_NONE option we could perhaps use, and handle the serialization ourselves.