Page MenuHomePhabricator

Thin out old revision renders
Closed, ResolvedPublic

Assigned To
Authored By
Mar 27 2015, 5:10 PM
Referenced Files
F107893: pasted_file
Apr 1 2015, 7:57 PM
F107212: pasted_file
Mar 31 2015, 12:05 AM
F107104: pasted_file
Mar 30 2015, 6:06 PM


Before implementing the recent re-render optimizations we stored a lot more re-renders than necessary. We should thin those out to only leave a single / latest entry per revision.

We might also want to implement time-based storage policies, for example to only keep one render per 24-hour interval. This should probably be implemented in the cassandra backend, so that the retention policy can be configured in the schema. This is discussed in T94524: Configurable garbage collection / revision retention policy in table schemas: add 'interval' policy support.

Event Timeline

GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke added a project: RESTBase.

It took a couple of attempts and modifications to the script to actually make it reliable, but four copies of it have now been running since yesterday afternoon with the following result:

pasted_file (440×1 px, 200 KB)

Some observations:

  • it is really common for articles to have > 10 renders per revision
  • an interesting class of often-rerendered articles are user-created admin dashboards (often User:Something/dashboard)

Compaction is picking up more of the slack now:

pasted_file (974×1 px, 309 KB)

I think we can move ahead with group 1 projects now.

GWicke closed this task as Resolved.EditedApr 6 2015, 11:32 PM
GWicke claimed this task.

Closing for the manual part.

These scripts have been run about once per day since ticket creation, which in combination with additional optimizations implemented earlier was very effective at limiting the growth rate. The most heavily loaded node now stores 800G. The growth rate is currently only about 1G per day, but the actual rate might actually be higher as there are still a lot of tombstones being processed by compaction. We'll have to wait for a bit longer to establish new growth rates.