This follows-up from T280605#7070345
This purgeParserCache.php script is scheduled to run every night, to prune ParserCache blobs that then are beyond their expiry date. Our blobs generally have an expiry date of 30 days, which means we expect this nightly run will remove the blobs we stored roughly 30 days ago on that day.
As of writing, the purge script now takes over a week to complete a single run. This has numerous consequences:
- Due to taking 10 days to run, we are effectively having to accomodate blobs for upto 37 days rather than 30-31 days. This means more space is occupied by default.
- Each run is taking longer than the last. This means the backlog is growing, and thus the space consumption as well. E.g. I expect we'll soon be accomodating blobs for 40 days, etc. There is no obvious end, other than a full disk.
- With the backlog growing, the run will take even longer, as it has to iterate more blobs to purge them. See point 2.
What we know
The script ran daily up until 19 April 2020 (last year):
- 19 April 2020: Run took 1 day (the last time this happened).
- 24 April 2020: Run took 3 days.
- 26 Jun 2020: Run took 4 days.
- 28 Nov 2020: Run took 6 days.
- 13 Apr 2021: Run took 7 days.
- 7 May 2021: Run was aborted after 5 days during which it completed 81% (2 May 01:51 - 7 May 05:23)
- 13 May 2021: The current is at 26% which has taken 116 hours so far (May 7 05:23 - May 13 01:42). Extrapolating I would expect 446 hours in total, or 18 days?
(Caveat: The script's percentage meter assumes all shards are equal which they probably aren't.)
The script iterates over each parser cache database host, then each parser cache table on that host, and then selects/deletes in batches of 100 rows with a past expiry date. (code 1, code 2). It waits for a 500 ms sleep between each such batch.
This sleep was introduced in 2016 to mitigate T150124: Parsercache purging can create lag.
In 2016, the first mitigation used 100ms, which was then increased to 500ms.
Note that this task is not about product features adding more blobs to the ParserCache in general. I believe as it stands, the problem this task is about, will continue to worses even if our demand remains constant going forward. However, that increased demand in the last 12 months (see T280605) has pushed us over an invisible tipping point that has cascaded into this self-regressing situation.