Page MenuHomePhabricator

Create low level batch access interface for page content
Closed, ResolvedPublic

Description

RevisionStore should offer a way to retrieve the raw (serialized) content of a batch of revisions. This is similar to BlobStore::getBlobBatch(), but would take as an input revision IDs and slot names, not blob addresses. The result must include the content model associated with each blob, to enable the caller to properly unserialize and interpret the content.

Rationale:
Work on T228675: Remove direct access to the text table from the Translate extension. showed that a high level interface for loading a batch of RevisionRecords as defined by T228988 does not meet performance requirements, degrading performance by a factor of 2.5. AS the cause we identified overhead caused by the instantiation of a great number of instances of RevisionRecord, RevisionSlots, SlotRecord, and Content - all of which were immediately discarded after extracting the raw text of the page. A batch interface that bypasses the construction of all these objects seems like the obvious solution.

Related Objects

StatusSubtypeAssignedTask
StalledNone
OpenNone
Resolveddaniel
ResolvedCCicalese_WMF
Resolveddaniel
ResolvedNone
ResolvedNone
ResolvedCCicalese_WMF
Resolveddaniel
ResolvedPchelolo
Resolveddaniel
ResolvedBPirkle
ResolvedPchelolo
ResolvedPchelolo
Resolveddaniel
Resolveddaniel
OpenNone
OpenMarostegui
ResolvedBstorm
Opendaniel

Event Timeline

daniel created this task.Sep 27 2019, 11:45 AM

Change 539324 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] [EXPERIMENTAL] RevisionStore: Introduce getSlotRowsForBatch

https://gerrit.wikimedia.org/r/539324

Change 539324 merged by jenkins-bot:
[mediawiki/core@master] RevisionStore: Introduce getContentBlobsForBatch

https://gerrit.wikimedia.org/r/539324