Page MenuHomePhabricator

RFC: Next steps for long-term revision storage -- space needs, storage hierarchies
Closed, InvalidPublic

Description

RESTBase is currently storing HTML and data-parsoid for one or more renders of each revision. Compression ratios for this so far are:

  • 14.4% for data-parsoid
  • 16.9% for html

At the current template update rates (~50/s) we are seeing a compressed storage growth on the order of 60G/day. This means that the provisioned 2.5T per node in a six-node cluster with three-way replication will only last for ~60 more days if we don't change anything.

There are however several options to reduce our storage requirements:

Related Objects

StatusSubtypeAssignedTask
InvalidNone
DeclinedEevans
DeclinedEevans
DeclinedEevans
ResolvedEevans
ResolvedNone
OpenNone
ResolvedEevans
ResolvedEevans
Resolved GWicke
Resolved GWicke
OpenNone
Resolvedmarcoil
OpenNone
ResolvedRobH
Resolvedfgiunchedi
Resolvedfgiunchedi
Resolvedfgiunchedi
ResolvedEevans
ResolvedEevans
ResolvedEevans
Resolved GWicke
Resolved GWicke
ResolvedCmjohnson
Resolvedfgiunchedi
ResolvedCmjohnson
Resolvedfgiunchedi
Resolved GWicke
ResolvedPchelolo

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a subscriber: GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2015, 3:15 PM
GWicke triaged this task as Medium priority.Mar 24 2015, 3:15 PM
GWicke raised the priority of this task from Medium to High.
GWicke added a project: RESTBase.
GWicke set Security to None.
GWicke updated the task description. (Show Details)
GWicke edited subscribers, added: mobrovac, Eevans; removed: Aklapper.
GWicke updated the task description. (Show Details)Mar 24 2015, 3:21 PM
GWicke updated the task description. (Show Details)
GWicke moved this task from Backlog to Ready / next on the RESTBase board.Mar 24 2015, 3:28 PM
GWicke updated the task description. (Show Details)Mar 24 2015, 3:34 PM
GWicke updated the task description. (Show Details)Mar 24 2015, 5:58 PM
GWicke updated the task description. (Show Details)Mar 24 2015, 6:02 PM
GWicke updated the task description. (Show Details)Mar 24 2015, 6:15 PM
GWicke updated the task description. (Show Details)Mar 24 2015, 8:41 PM
GWicke updated the task description. (Show Details)Mar 25 2015, 12:53 AM
GWicke updated the task description. (Show Details)Mar 27 2015, 5:12 PM
GWicke updated the task description. (Show Details)

T93777 and T93779 have slowed the growth to about 40G/day. Once the first patch for T93715 is deployed (scheduled for Monday) we'll see a reduction in the number of re-renders that are differing & are thus saved. Compression ratios will benefit further from future deterministic about attribute work in Parsoid.

Our next focus areas are T94196 and research for T93496.

GWicke updated the task description. (Show Details)Mar 31 2015, 1:00 AM
GWicke moved this task from Ready / next to In progress on the RESTBase board.Apr 6 2015, 11:34 PM
GWicke added a comment.EditedApr 6 2015, 11:46 PM

Status update from https://phabricator.wikimedia.org/T94196#1184252:

"The most heavily loaded node now stores 800G. The growth rate is currently only about 1G per day, but the actual rate might be higher as there are still a lot of tombstones being processed by compaction. We'll have to wait for a bit longer to establish new growth rates."

GWicke updated the task description. (Show Details)May 9 2015, 4:33 PM
RobH changed the status of subtask T93790: Expand RESTBase cluster capacity from Open to Stalled.Aug 26 2015, 11:54 PM
Pchelolo closed this task as Invalid.Jul 11 2019, 1:05 AM
Pchelolo added a subscriber: Pchelolo.

We have moved away from the idea of long-term HTML revision storage. Invalid.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptJul 11 2019, 1:05 AM