Page MenuHomePhabricator

Improvements to the new storage deletion mechanism
Closed, InvalidPublic

Description

The current mechanism of cleaning up older renders and revisions in the multi-content-bucket does have a flaw when the TTL for indexes is not enough. Here's the idea for a new method. The idea only talks about the revision timeline, but the render timeline is exactly the same.

Currently, when writing a new revision we make a request to a revision-timeline like this:

SELECT * FROM revision-timeline WHERE _domain = <domain> AND key = <page_title> and ts <= (Date.now() - grace_ttl) LIMIT 1;

If this query gives 404 that either means the relevant index entries have expired or that no revisions have passed the window of recency. This creates a problem when large numbers of revisions actually slip through if they are edited in bulks within the grace_ttl period.

Proposal: change query to

SELECT * FROM revision-timeline WHERE _domain = <domain> AND key = <page_title> and ts >= (Date.now() - grace_ttl) LIMIT 1;

(Note the change in non-equal sign - it was less then, now it’s more then)
(NOTE that this will require a change in the revision-timeline indexing order!)

So this will give us the oldest revision within the current window of recency. If the query gives the result, that means any revision with a number STRICTLY LESS than the oldest one within the window of recency can be deleted, so we issue a range delete with a strict <.

If the query returns 404, that means the index has expired and the revision we are writing in right now is the first one after a long period of inactivity. That means that any revision prior to the previously latest one was superseded more than index_ttl time ago, which means it can be deleted. Currently, on writing a new revision we actually read the previously latest revision, so we know the prev_latest_rev, so we can issue a range delete to drop anything prior to that.

The render timeline will work exactly the same.

Event Timeline

Vvjjkkii renamed this task from Improvements to the new storage deletion mechanism to qodaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii removed Pchelolo as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
mobrovac renamed this task from qodaaaaaaa to Improvements to the new storage deletion mechanism.Jul 1 2018, 10:30 AM
mobrovac assigned this task to Pchelolo.
mobrovac raised the priority of this task from High to Needs Triage.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: Pchelolo.

We have eliminated the timeline-based storage semantics alltogether, so this is no longer a valid issue.