RFC: Deprecate using php serialization inside MediaWiki
Closed, ResolvedPublic
Actions

Description

The first version of this convention has since been published. See mw:Coding conventions/PHP § Don't use built in serialization.

Problem statement

PHP unserialize() and serialize() can execute code when given malicious input. In most cases this serialization format is unnecessary. As a hardening measure against making a mistake that could result in remote code execution, we should avoid this format, even in cases where the serialized data is stored in a trusted data-store (such as the db).

Threats this rfc is intended to counter:

A bug in MediaWiki allows a user to inject untrusted data into an unserialize call. Removing unserialize reduces the potential for mistakes.
An attacker somehow obtains write access to either the database or memcache, and wants to extend his/her access to arbitrary code execution.

Proposed guideline

This RFC proposes the following:

New code SHOULD use JSON instead of PHP serialization whenever possible for serializing data.
Serialization of primitive values and key-value structures MUST never use PHP serialization.
Any edge cases that require use of serialize or unserialize complicated classes, MUST protect the serialized blob with HMAC (e.g. keyed to $wgSecretKey) to protect against malicious modifications of the blob. This logic should be implemented in a class (e.g. MWSerializeWrapper) to avoid copy-pasted code all over the place
Using unserialize is fine if the data never leaves the current process. In particular $clone = unserialize( serialize( $obj ) )

In addition to the new guideline for new code, this RFC proposes that we start to (slowly) convert existing uses of PHP serialization including old data in the db. Most likely by using JSON. The eventual goal being to remove all legacy uses of php unserialize()

Good first candidates for conversion:

LocalisationCache
MediaHandler metadata. This is particularly risky because the API will unserialize regardless of which MediaHandler class is in use.

Things still allowed under this RFC

Using php serialization on data that we never ingest (unserialize) is fine. In particular the php serialization output format of the API is outside of the scope of this RFC.
Using unserialize is fine if the data never leaves the current process. In particular using $clone = unserialize( serialize( $obj ) ) as a hack to create a deep clone is fine.

Unanswered questions

How to deal with memcached. We could potentially use a custom memcache client - we already have a php implementation. Its unclear what sort of performance loss there would be compared to using the memcache php extension. We could also potentially modify the php memcache extension to do what we want. php Memcache also has a Memcached::SERIALIZER_JSON which is perhaps what we are looking for. More investigation is needed
Redis is similar to memcached. There is a SERIALIZER_NONE option we could perhaps use, and handle the serialization ourselves.

Related Objects
Search...

Status	Assigned	Task
Resolved	daniel	T161647 RFC: Deprecate using php serialization inside MediaWiki
Open	None	T181555 Remove use of PHP serialization in revision storage
Resolved	Anomie	T183419 Determine how to update old compressed ExternalStore entries for T181555

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In T161647#3221963, @Smalyshev wrote:

How often it is needed and not possible to cover with __clone? I'd suggest deprecating this and if the objects are under our control, use proper APIs - either __clone or if for some reason it's not enough, custom interface. If we need to clone objects not under our control (libraries?) I'd advocate marking these cases as technical debt and complaining upstream until they are fixed.

Although making __clone do a deep clone assumes that you always want a deep clone for that object, no exceptions.

assumes that you always want a deep clone for that object, no exceptions.

True. We should make a decision then what we mean by clone, and if we mean other thing in this particular case, use custom interface. We're getting a bit offtopic though I think :)

daniel moved this task from P5: Last Call to TechCom-RFC-Closed on the TechCom-RFC board.May 3 2017, 7:54 PM

daniel edited projects, added TechCom-RFC (TechCom-RFC-Closed); removed TechCom-RFC.

@Smalyshev implementing deep cloning by hand is quite annyoing for complex objects, especially if they are extensible. We currently use serialize/unserialize to clone Wikibase Entities. Works for all subclasses, no brittle traversal code needed.

One possible stop-gap for back-compatibility of old data for usages that don't require complex classes or looping object graphs would be to use a custom unserialize that can create stdObject instances only and never runs code.

This wouldn't cover all cases though, and we should list the ones that require non-trivial classes.

Note for migration planning of usages -- IIRC serialized database stuff is mostly arrays or stdobjects, while memcache stuff has more serialized complex classes... (Memcache is a potential attack vector in many ways, and this makes it scarier!)

For memcached we could double-encode the values. Use $memcached->setOption( Memcached::OPT_SERIALIZER, Memcached::SERIALIZER_JSON ), and add HMAC authentication to MemcachedBagOStuff. The data would be double-serialized, {"hmac": "...", "value": "O:..."}

daniel moved this task from TechCom-RFC-Closed to Under discussion on the TechCom-RFC board.May 12 2017, 12:08 PM

daniel edited projects, added TechCom-RFC; removed TechCom-RFC (TechCom-RFC-Closed).

daniel updated the task description. (Show Details)

This RFC was due for a decision during the ArchCom meeting on May 10. It seems like not all concerns that were brought up during the Last Call period where addressed. The following additional points were brought up during the ArchCom meeting:

it's unclear whether we want to convert existing data (in the database) to JSON
we could use a restrictive custom unserialize implementation that.
do we have a clear migration plan for all uses of serialize/unserialize?
where should the HMAC magic go? It would be bad to spread it all over the codebase.
the memcached native library will (per default) unserialize php objects by itself. Even if we don't put objects into memcached, an attacker still could, and trigger unserialize this way. Other services/libraries, like Redis, may have the same problem.
Should ParserOutput::setExtensionData support only scalars?

It seems like there is a general consensus that it would indeed be a good idea to not use php's serialize method. The proposal should be amended to reflect the issues mentioned above, and in other comments.

Scott_WUaS subscribed.May 12 2017, 4:28 PM

One handy (ab)use of php serialization is deep cloning

I'd never heard of this hack before, but imo its ok as long as the serialized object is unserialized immediately. As long as the serialized data is never stored, it can't be manipulated by an adversary.

t's unclear whether we want to convert existing data (in the database) to JSON

I would say yes (Eventually, we don't have to do it immediately). Stopping use of serialize is pointless if we have code that fallsback to using unserialize() for back-compat.

where should the HMAC magic go? It would be bad to spread it all over the codebase.

Definitely. There should probably be a wrapper that handles this sort of thing. Instead of calling serialize, users could do something along the lines of $s = new MWSerializer( $config ); $s->serialize( $foo ); $s->unserialize( $bar ); etc.

the memcached native library will (per default) unserialize php objects by itself. Even if we don't put objects into memcached, an attacker still could, and trigger unserialize this way. Other services/libraries, like Redis, may have the same problem.

This is hard to fix. I guess we could change the php library version and use that instead if there's no performance difference (I imagine there's a reason why the native library exists, so that's probably a no-go). I suppose the only other option would be to patch the native library and use a custom version for us.

Should ParserOutput::setExtensionData support only scalars?

I don't think that's necessary.

@Bawolff can you please update the task description to reflect the current state of the discussion?

In T161647#3290181, @daniel wrote:

@Bawolff can you please update the task description to reflect the current state of the discussion?

Done.

As another semi-related note, it may make sense to add a __wakeUp() method that just throws an exception to high risk classes like ScopedCallback

daniel moved this task from Under discussion to P1: Define on the TechCom-RFC board.Jun 27 2017, 10:35 AM

I have filed T169328: Protect against PHP code execution via memcached/unserialize for the memcached issue. It should not be part of this RFC. Making it policy to avoid unserialize() in PHP code is sensible regardless of the shortcomings of PHP's memcached library.

Ricordisamoa subscribed.Jul 4 2017, 9:11 AM

As per the ArchCom meeting on July 5th, this RFC is entering the Last Call period. It will be approved for implementation if now pertinent issues remain unaddressed by July 19.

As per the ArchCom meeting on July 19th, this RFC has been approved for implementation. No concerns where raised during the last call period.

@Bawolff now that this has been approved, can you turn this into a guideline on mediawiki.org? It should fit somehow with https://www.mediawiki.org/wiki/Security_for_developers I suppose, and it should also be mentioned on https://www.mediawiki.org/wiki/Security_checklist_for_developers

CCicalese_WMF added a project: MediaWiki-Platform-Team-Archived (MWPT-Q2-Oct-Dec-2017).Oct 13 2017, 9:23 PM

Krinkle moved this task from Untriaged to Approved on the TechCom-RFC (TechCom-RFC-Closed) board.Nov 4 2017, 12:41 AM

Anomie created subtask T181555: Remove use of PHP serialization in revision storage.Nov 28 2017, 7:13 PM

PHP 7 introduces a class whitelist to unserialize. That protects against userland attacks, although not necessarily against PHP bugs. So I guess not a reason to reconsider but a good way to secure unserialize calls kept for B/C, once we bump the required PHP version.

Agabi10 subscribed.Dec 20 2017, 11:47 PM

CCicalese_WMF moved this task from MWPT-Q2-Oct-Dec-2017 to Backlog on the MediaWiki-Platform-Team-Archived board.Dec 22 2017, 1:32 AM

CCicalese_WMF edited projects, added MediaWiki-Platform-Team-Archived; removed MediaWiki-Platform-Team-Archived (MWPT-Q2-Oct-Dec-2017).

CCicalese_WMF moved this task from Backlog to MWPT-Q3-Jan-Mar-2018 on the MediaWiki-Platform-Team-Archived board.

CCicalese_WMF edited projects, added MediaWiki-Platform-Team-Archived (MWPT-Q3-Jan-Mar-2018); removed MediaWiki-Platform-Team-Archived.

Krinkle moved this task from Approved to In progress on the TechCom-RFC (TechCom-RFC-Closed) board.Jan 4 2018, 8:21 PM

Mainframe98 mentioned this in T185652: AutoProxyBlock uses unserialization on externally obtained php code.Jan 24 2018, 6:40 PM

cicalese moved this task from Ready to In Progress on the MediaWiki-Platform-Team-Archived (MWPT-Q3-Jan-Mar-2018) board.Feb 11 2018, 10:33 PM

CCicalese_WMF moved this task from In Progress to Watching on the MediaWiki-Platform-Team-Archived (MWPT-Q3-Jan-Mar-2018) board.Feb 27 2018, 2:49 AM

Krinkle mentioned this in T190379: RFC: Re-establish the development policies.Mar 22 2018, 1:58 AM

• Pchelolo mentioned this in T191024: Exception thrown while running DataSender::sendData in cluster codfw: Data should be a Document, a Script or an array containing Documents and/or Scripts.Mar 29 2018, 4:08 PM

CCicalese_WMF edited projects, added MediaWiki-Platform-Team-Archived (MWPT-Q4-Apr-Jun-2018); removed MediaWiki-Platform-Team-Archived (MWPT-Q3-Jan-Mar-2018).Apr 4 2018, 6:53 PM

CCicalese_WMF moved this task from Ready to Watching on the MediaWiki-Platform-Team-Archived (MWPT-Q4-Apr-Jun-2018) board.Apr 4 2018, 6:55 PM

• Pchelolo mentioned this in T192111: Make TranslationsUpdateJob JSON-serializable.Apr 12 2018, 8:47 PM

• Pchelolo mentioned this in T192945: Make EchoNotification job JSON-serializable .Apr 24 2018, 5:48 PM

• Pchelolo mentioned this in T192946: Make gwtoolsetUploadMediafileJob JSON-serializable.

daniel mentioned this in T187153: Special:Abuselog throws when viewing details or examining (BadMethodCallException: Call get getId() on null).May 7 2018, 11:05 AM

daniel mentioned this in T197252: Can't correctly unserialize cached EntityRevision on WikibaseClient (Wikipedia) .Jun 15 2018, 1:46 PM

This may be relevant: https://wiki.php.net/rfc/secure_unserialize

Yes, this may reduce attack surface and eliminate the obvious and known issues. Though I can not promise there are no attacks that don't use classes that we need (or that classes that we may use are 100% secure against serialization attacks). Serialize is just too powerful and complex to be secure with arbitrary data...
OTOH, I think now that we are PHP 7 it may be worth checking into this just to have one more security layer there.

CCicalese_WMF mentioned this in T199371: Improve security, stability, performance and scalability of MediaWiki (TEC1).Jul 11 2018, 9:33 PM

CCicalese_WMF removed a project: MediaWiki-Platform-Team-Archived (MWPT-Q4-Apr-Jun-2018).Jul 11 2018, 10:40 PM

WMDE-leszek subscribed.Jul 18 2018, 11:04 AM

xSavitar subscribed.Aug 12 2018, 10:21 PM

Tgr mentioned this in T203781: Allow Parser::VERSION to be bumped without immediately resetting the ParserCache.Sep 7 2018, 5:09 PM

Anomie mentioned this in T210528: PHP/HHVM serialization incompatibility in some situations when using Serializable.Nov 27 2018, 6:41 PM

• mobrovac added a project: Platform Team Legacy (Watching / External).Dec 20 2018, 12:04 PM

Daimona mentioned this in T213006: Create a script to update afl_var_dump, drop back-compat code.Jan 6 2019, 1:38 PM

In T161647#3852558, @Tgr wrote:

PHP 7 introduces a class whitelist to unserialize. That protects against userland attacks, although not necessarily against PHP bugs.

Now that we have switched to PHP7, it would be a quick win to add an empty whitelist everywhere where we don't expect classes (MediaHandler, HistoryBlob, Message, SiteConfiguration (hopefully), LogEntry/RecentChanges, probably more).

sbassett subscribed.Feb 11 2019, 2:55 PM

Nikerabbit mentioned this in T213802: Investigate ways to reduce the size of translate-groups cache key.Mar 7 2019, 1:54 PM

Tagging for TechCom internally to talk about this week. Specifically, what are the next steps? To document at https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP?

I proposed an addition to the PHP coding conventions here: https://www.mediawiki.org/wiki/Manual_talk:Coding_conventions/PHP#Add_a_rule_about_not_using_PHP_serialization

Please chime in.

Pastakhov subscribed.Apr 17 2019, 7:22 PM

daniel moved this task from Inbox to In progress on the TechCom board.Apr 17 2019, 8:32 PM

Krinkle closed this task as Resolved.May 9 2019, 11:45 PM

Krinkle assigned this task to daniel.

Krinkle updated the task description. (Show Details)

daniel mentioned this in T222099: Staging release of RESTBagOStuff using Kask.Sep 19 2019, 9:09 PM

Reedy mentioned this in T233146: Cannot enable 2FA on testwiki.Sep 19 2019, 9:29 PM

BPirkle mentioned this in T233537: Document and communicate potentially breaking session storage serialization change.Sep 22 2019, 7:56 PM

BPirkle mentioned this in T233963: Add serialization options to RESTBagOStuff.Sep 26 2019, 4:03 PM

Krinkle moved this task from In progress to Implemented on the TechCom-RFC (TechCom-RFC-Closed) board.Oct 16 2019, 10:19 PM

• chasemp added a project: Security.Feb 10 2020, 10:58 PM

• chasemp removed a project: acl*security.Feb 20 2020, 8:17 PM

daniel mentioned this in T263579: Change ParserCache serialization format to JSON.Oct 7 2020, 9:21 AM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:42 PM

Change 662714 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] DNM: WANObjectCache: warn on non-JSONic values.

https://gerrit.wikimedia.org/r/662714

gerritbot added a project: Patch-For-Review.Feb 8 2021, 2:38 PM

daniel mentioned this in T274189: Remove usage of PHP serialization from WANObjectCache.Feb 8 2021, 8:56 PM

daniel mentioned this in T274190: Make value objects in AbuseFilter JSON-Serializable.Feb 8 2021, 9:00 PM

Daimona mentioned this in T259111: PHP Notice: unserialize(): Error at offset 65519 of 65535 bytes.Mar 8 2021, 3:35 PM

hashar mentioned this in T291124: PHP Notice: Undefined index: format.Sep 16 2021, 7:17 AM

Tgr mentioned this in T296610: MediumSpecificBagOStuff->guessSerialValueSize infinite loop when storing Title object (Special:Homepage throws "Maximum function nesting reached").Nov 29 2021, 6:42 AM

Krinkle mentioned this in T303194: Gadgets extensions should not cache serialized PHP objects.Mar 7 2022, 4:19 PM

Krinkle mentioned this in T269034: 1.37 Remove support for PHP serialization from ParserCache.Jun 8 2022, 5:34 PM

Krinkle mentioned this in T234455: Implement our own Memcached client to support pipelined operations (remove dependency on PECL).Oct 4 2022, 9:14 PM

Reedy mentioned this in T323236: PHP Warning: Class RawMessage has no unserializer.Nov 21 2022, 2:03 AM

Tgr mentioned this in T325703: Switch Echo serialization format from PHP to JSON.Dec 21 2022, 1:16 AM