Page MenuHomePhabricator

Compute meaningful size in "bogo-bytes" for entity revisions.
Closed, ResolvedPublic

Description

RecentChanges shows a misleading byte difference

For this edit

http://wikidata-test-repo.wikimedia.de/w/index.php?title=Data:Q3&diff=109096&oldid=109095

RecentChanges shows:

Data:Q3‎; 17:44 . . (-3.480 Bytes)‎ . . ‎Raymond


Version: master
Severity: enhancement

Details

Reference
bz37753

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:24 AM
bzimport set Reference to bz37753.
bzimport added a subscriber: Unknown Object (MLST).

Misleading, but true, due to a change in the internal structure. I am not sure if we should display the bytediff at all, as it does not make much sense...

Not sure how difficult it would be, but an item diff would be cute and helpful. Something like: +3 -1 ~5 (3 items added, 1 item removed, 5 items changed).

I don't think detailed information like what Nikola suggested is possible without serious changes to the recentchanges and revision tables - something I would like to avoid.

However, the Item class could implement its getSize() method to do something smarter than just return the size of the json. The notion if "size" is purely abstract here, it could be anything. For instance it could return the number of atomic values in the structure.

Yes, Daniel's suggestions sounds very reasonable.

I've just got the idea that number of added/removed/modified items could be encoded in bits of rev_len/rc_old_len/rc_new_len fields (rev_len is 64bit IIRC) and then interpreted by the page if it is in Data namespace. Seems like an unnecessary complication, but maybe one day :)

The current implementation of EntityContent::getSize() is:

return strlen( serialize( $this->getNativeData() ) );

I suppose that can be improved. It would probably already help to ignore all the keys in the arrays; we could use array_walk_recursive to calculate the site efficiently.

  • This bug has been marked as a duplicate of bug 39189 ***