Page MenuHomePhabricator

RFC: Deprecate using php serialization inside MediaWiki
Closed, ResolvedPublic

Description

The first version of this convention has since been published. See mw:Coding conventions/PHP § Don't use built in serialization.

Problem statement

PHP unserialize() and serialize() can execute code when given malicious input. In most cases this serialization format is unnecessary. As a hardening measure against making a mistake that could result in remote code execution, we should avoid this format, even in cases where the serialized data is stored in a trusted data-store (such as the db).

Threats this rfc is intended to counter:

  • A bug in MediaWiki allows a user to inject untrusted data into an unserialize call. Removing unserialize reduces the potential for mistakes.
  • An attacker somehow obtains write access to either the database or memcache, and wants to extend his/her access to arbitrary code execution.

Proposed guideline

This RFC proposes the following:

  • New code SHOULD use JSON instead of PHP serialization whenever possible for serializing data.
  • Serialization of primitive values and key-value structures MUST never use PHP serialization.
  • Any edge cases that require use of serialize or unserialize complicated classes, MUST protect the serialized blob with HMAC (e.g. keyed to $wgSecretKey) to protect against malicious modifications of the blob. This logic should be implemented in a class (e.g. MWSerializeWrapper) to avoid copy-pasted code all over the place
  • Using unserialize is fine if the data never leaves the current process. In particular $clone = unserialize( serialize( $obj ) )

In addition to the new guideline for new code, this RFC proposes that we start to (slowly) convert existing uses of PHP serialization including old data in the db. Most likely by using JSON. The eventual goal being to remove all legacy uses of php unserialize()

Good first candidates for conversion:

  • LocalisationCache
  • MediaHandler metadata. This is particularly risky because the API will unserialize regardless of which MediaHandler class is in use.

Things still allowed under this RFC

  • Using php serialization on data that we never ingest (unserialize) is fine. In particular the php serialization output format of the API is outside of the scope of this RFC.
  • Using unserialize is fine if the data never leaves the current process. In particular using $clone = unserialize( serialize( $obj ) ) as a hack to create a deep clone is fine.

Unanswered questions

  • How to deal with memcached. We could potentially use a custom memcache client - we already have a php implementation. Its unclear what sort of performance loss there would be compared to using the memcache php extension. We could also potentially modify the php memcache extension to do what we want. php Memcache also has a Memcached::SERIALIZER_JSON which is perhaps what we are looking for. More investigation is needed
  • Redis is similar to memcached. There is a SERIALIZER_NONE option we could perhaps use, and handle the serialization ourselves.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have filed T169328: Protect against PHP code execution via memcached/unserialize for the memcached issue. It should not be part of this RFC. Making it policy to avoid unserialize() in PHP code is sensible regardless of the shortcomings of PHP's memcached library.

As per the ArchCom meeting on July 5th, this RFC is entering the Last Call period. It will be approved for implementation if now pertinent issues remain unaddressed by July 19.

As per the ArchCom meeting on July 19th, this RFC has been approved for implementation. No concerns where raised during the last call period.

@Bawolff now that this has been approved, can you turn this into a guideline on mediawiki.org? It should fit somehow with https://www.mediawiki.org/wiki/Security_for_developers I suppose, and it should also be mentioned on https://www.mediawiki.org/wiki/Security_checklist_for_developers

PHP 7 introduces a class whitelist to unserialize. That protects against userland attacks, although not necessarily against PHP bugs. So I guess not a reason to reconsider but a good way to secure unserialize calls kept for B/C, once we bump the required PHP version.

Yes, this may reduce attack surface and eliminate the obvious and known issues. Though I can not promise there are no attacks that don't use classes that we need (or that classes that we may use are 100% secure against serialization attacks). Serialize is just too powerful and complex to be secure with arbitrary data...
OTOH, I think now that we are PHP 7 it may be worth checking into this just to have one more security layer there.

PHP 7 introduces a class whitelist to unserialize. That protects against userland attacks, although not necessarily against PHP bugs.

Now that we have switched to PHP7, it would be a quick win to add an empty whitelist everywhere where we don't expect classes (MediaHandler, HistoryBlob, Message, SiteConfiguration (hopefully), LogEntry/RecentChanges, probably more).

Krinkle subscribed.

Tagging for TechCom internally to talk about this week. Specifically, what are the next steps? To document at https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP?

Krinkle assigned this task to daniel.
Krinkle updated the task description. (Show Details)

Change 662714 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] DNM: WANObjectCache: warn on non-JSONic values.

https://gerrit.wikimedia.org/r/662714

Change #1165581 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Use JSON serialization for PageEditStash

https://gerrit.wikimedia.org/r/1165581

Change #1165572 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Warn if PHP serialization of ParserOutput occurs

https://gerrit.wikimedia.org/r/1165572

Change #1165592 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Make Content JsonCodecable

https://gerrit.wikimedia.org/r/1165592

Change #1166243 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Turn on JSON serialization for PageEditStash

https://gerrit.wikimedia.org/r/1166243

Change #1166244 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Turn on JSON serialization of Content in PageEditStashContent

https://gerrit.wikimedia.org/r/1166244

Change #1165581 merged by jenkins-bot:

[mediawiki/core@master] Prepare to use JSON serialization for PageEditStash

https://gerrit.wikimedia.org/r/1165581

Change #1165592 merged by jenkins-bot:

[mediawiki/core@master] Make Content JsonCodecable

https://gerrit.wikimedia.org/r/1165592

Change #1166243 merged by jenkins-bot:

[mediawiki/core@master] Turn on JSON serialization for PageEditStash

https://gerrit.wikimedia.org/r/1166243

Change #1166244 merged by jenkins-bot:

[mediawiki/core@master] Turn on JSON serialization of Content in PageEditStashContent

https://gerrit.wikimedia.org/r/1166244

Change #1165572 merged by jenkins-bot:

[mediawiki/core@master] Warn if PHP serialization of ParserOutput occurs

https://gerrit.wikimedia.org/r/1165572

Change #1185190 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_43] Make Content JsonCodecable

https://gerrit.wikimedia.org/r/1185190

Change #1185191 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_44] Make Content JsonCodecable

https://gerrit.wikimedia.org/r/1185191

Change #1185191 merged by jenkins-bot:

[mediawiki/core@REL1_44] Make Content JsonCodecable

https://gerrit.wikimedia.org/r/1185191

Change #1185190 merged by jenkins-bot:

[mediawiki/core@REL1_43] Make Content JsonCodecable

https://gerrit.wikimedia.org/r/1185190