Page MenuHomePhabricator

[Task] Implement deferred deserialization of Entity
Open, HighPublic

Description

We suspect performance issue with the current implementation of Entity, because it requires a full deserialization of the Entity to access any part. To even be able to investigate this suspicion, we need an alternative implementation that avoids full deserialization.

The deferred deserialization implementation should only unserialize individual parts when they are accessed. Parts to unserialize serparately are at least:

  • terms
  • statements
  • sitelinks

More fine grained separation may or may not be worthwhile, so it is not part of this task and would need investigation first.

Event Timeline

daniel created this task.Feb 25 2015, 12:55 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added subscribers: Aklapper, daniel.

@daniel: Associating a project to this task highly welcome so someone might find it when searching by open tasks per project. :)

Lydia_Pintscher triaged this task as High priority.Mar 30 2015, 9:49 AM
Lydia_Pintscher set Security to None.
Lydia_Pintscher renamed this task from Implement deferred deserialization of Entity to [Task] Implement deferred deserialization of Entity.Sep 8 2015, 2:27 PM
Lydia_Pintscher updated the task description. (Show Details)

What do we still need to investigate here?

The benchmarking needs to be done and then decided how to move forward.

@Lydia_Pintscher we can only benchmark it after we implemented it. We could profile the current code to identify hotspots, but we don't really need that to know that we shouldn't deserialize sitelinks when asking for a label. Profiling might tell us how we can further refine lazy deserialization. Would be useful for the fine tuning.

@Lydia_Pintscher we can only benchmark it after we implemented it. We could profile the current code to identify hotspots, but we don't really need that to know that we shouldn't deserialize sitelinks when asking for a label. Profiling might tell us how we can further refine lazy deserialization. Would be useful for the fine tuning.

We can benchmark a use-case before implementing a change. From my experience in the front-end performance optimization, this makes a lot of sense (and nice graphs).

Do we have actual data yet to guide how we optimize things?