Page MenuHomePhabricator

Thin out old revision renders
Closed, ResolvedPublic

Description

Before implementing the recent re-render optimizations we stored a lot more re-renders than necessary. We should thin those out to only leave a single / latest entry per revision.

We might also want to implement time-based storage policies, for example to only keep one render per 24-hour interval. This should probably be implemented in the cassandra backend, so that the retention policy can be configured in the schema. This is discussed in T94524: Configurable garbage collection / revision retention policy in table schemas: add 'interval' policy support.

Event Timeline

GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke added a project: RESTBase.
GWicke added subscribers: Eevans, mobrovac, GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 27 2015, 5:10 PM
GWicke updated the task description. (Show Details)Mar 28 2015, 3:39 PM
GWicke set Security to None.
GWicke added a comment.EditedMar 30 2015, 6:06 PM

It took a couple of attempts and modifications to the script to actually make it reliable, but four copies of it have now been running since yesterday afternoon with the following result:

Some observations:

  • it is really common for articles to have > 10 renders per revision
  • an interesting class of often-rerendered articles are user-created admin dashboards (often User:Something/dashboard)
GWicke added a comment.EditedMar 31 2015, 12:05 AM

Compaction is picking up more of the slack now:

I think we can move ahead with group 1 projects now.

Another screenshot:

GWicke closed this task as Resolved.EditedApr 6 2015, 11:32 PM
GWicke claimed this task.

Closing for the manual part.

These scripts have been run about once per day since ticket creation, which in combination with additional optimizations implemented earlier was very effective at limiting the growth rate. The most heavily loaded node now stores 800G. The growth rate is currently only about 1G per day, but the actual rate might actually be higher as there are still a lot of tombstones being processed by compaction. We'll have to wait for a bit longer to establish new growth rates.