Page MenuHomePhabricator

RFC: Next steps for long-term revision storage -- space needs, storage hierarchies
Closed, InvalidPublic

Description

RESTBase is currently storing HTML and data-parsoid for one or more renders of each revision. Compression ratios for this so far are:

  • 14.4% for data-parsoid
  • 16.9% for html

At the current template update rates (~50/s) we are seeing a compressed storage growth on the order of 60G/day. This means that the provisioned 2.5T per node in a six-node cluster with three-way replication will only last for ~60 more days if we don't change anything.

There are however several options to reduce our storage requirements:

Related Objects

StatusSubtypeAssignedTask
InvalidNone
DeclinedEevans
DeclinedEevans
DeclinedEevans
ResolvedEevans
ResolvedNone
OpenNone
ResolvedEevans
ResolvedEevans
Resolved GWicke
Resolved GWicke
OpenNone
Resolved marcoil
OpenNone
ResolvedRobH
Resolvedfgiunchedi
Resolvedfgiunchedi
Resolvedfgiunchedi
ResolvedEevans
ResolvedEevans
ResolvedEevans
Resolved GWicke
Resolved GWicke
ResolvedCmjohnson
Resolvedfgiunchedi
ResolvedCmjohnson
Resolvedfgiunchedi
Resolved GWicke
ResolvedPchelolo

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a subscriber: GWicke.
GWicke triaged this task as Medium priority.Mar 24 2015, 3:15 PM
GWicke raised the priority of this task from Medium to High.
GWicke added a project: RESTBase.
GWicke set Security to None.
GWicke updated the task description. (Show Details)
GWicke edited subscribers, added: mobrovac, Eevans; removed: Aklapper.

T93777 and T93779 have slowed the growth to about 40G/day. Once the first patch for T93715 is deployed (scheduled for Monday) we'll see a reduction in the number of re-renders that are differing & are thus saved. Compression ratios will benefit further from future deterministic about attribute work in Parsoid.

Our next focus areas are T94196 and research for T93496.

Status update from https://phabricator.wikimedia.org/T94196#1184252:

"The most heavily loaded node now stores 800G. The growth rate is currently only about 1G per day, but the actual rate might be higher as there are still a lot of tombstones being processed by compaction. We'll have to wait for a bit longer to establish new growth rates."

Pchelolo added a subscriber: Pchelolo.

We have moved away from the idea of long-term HTML revision storage. Invalid.