Page MenuHomePhabricator

Post-deployment: (partly) ramp parser cache retention back up
Open, MediumPublic

Description

This task represents the work involved with the Data-Persistence team evaluating what impact the mitigation had on the parser cache utilisation during and after the 21 day period following the mitigation, and subsequently ramping the non-talkpage retention back up based on whether the reduced talkpage-retention successfully freed up the needed space.

Requirements

  • To have done: Ramp parser cache retention back up from 20 to 30 days.
  • To monitor and confirm the same site performance metrics as per T280606#7323626:
    1. "Parser cache disk space available", should remain above 20%. Measured via Grafana: Parser Cache
    2. "Parser cache hit ratio", has been stable around ~80% for article page views. Measured via Grafana: Parser Cache (contenttype; wikitext)
    3. "Backend pageview response time (p75)", has been stable around ~250ms for the past two years. Measured via Grafana: Backend pageview time.
    4. Monitor overall appserver load and internal latencies via the "Application Servers RED Dashboard".
    5. Daily purge of parser cache MUST take less than an actual day to run.

Done

  • The ===Requirements above are met

Event Timeline

Krinkle renamed this task from Post-deployment: evaluate impact on parser cache utilization to Post-deployment: (partly) ramp parser cache retention back up .May 4 2021, 5:33 PM
Krinkle assigned this task to Marostegui.
Krinkle added a project: Data-Persistence.
Krinkle updated the task description. (Show Details)
Marostegui edited projects, added DBA; removed Data-Persistence.
Marostegui moved this task from Triage to Blocked on the DBA board.
Marostegui added a subscriber: Marostegui.

Not assigning it to me specifically, as anyone could pick this up after the mitigation

  • disk space is still quite rapidly increasing despite shortened retention and daily purging, which suggests we're not going to stay stable for long given more data will mean longer purge times.
  • as part of restoring retention, purge time is expected to go up even furhter.
LSobanski triaged this task as Medium priority.Aug 30 2021, 3:10 PM
Krinkle updated the task description. (Show Details)

Unblocked from perf side per T280606#7323626. Signinging over to @Kormat to lead the next steps.

We have some additional margin today even on the old hardware, so we could start ramping up one day at a time now, or we could wait until your team is comfortable taking the old hardware out of rotation. I'll leave that to you.

Let's wait until Editing rolls out the changes to all wikis before doing this.