Page MenuHomePhabricator

Don't use PHP serialization to determine revision size for Entities
Open, Stalled, LowPublic

Description

EntityContent::getSize is currently implemented as:

return strlen( serialize( $this->getNativeData() ) );

This uses the native PHP serialization, which we don't use for anything else. The number returned has little to do with the actual storage size.

We could use the size of the JSON serialization instead, which could perhaps also be cached/re-used.

NOTE: when changing the behavior of getSize(), comparing the size difference between revisions becomes nonsensical. If we do this, page history will show a bogus size difference for the first edit that occurred after the change.

Event Timeline

daniel created this task.Feb 9 2017, 12:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2017, 12:15 PM
thiemowmde added subscribers: thiemowmde, hoo, aude, Addshore.

I just noticed the same issue while working on https://gerrit.wikimedia.org/r/395515 (T182082), which touches the called getNativeData method.

thiemowmde triaged this task as Low priority.Dec 5 2017, 4:00 PM
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.
Addshore changed the task status from Open to Stalled.Jul 12 2018, 3:32 PM

Going to mark this as stalled as a decision would need to be made regarding how we should measure entities and this is not currently being worked on in any way