Page MenuHomePhabricator

Post-deployment: evaluate impact on site performance
Open, Needs TriagePublic

Description

This task represents the work involved with the Performance-Team evaluating the impact changing the parser cache data retention time has had on the site's performance.

Requirements

  • In the ===Site performance section below, the Performance-Team documents what the change in the parser cache's data retention timing has had on the set of site performance metrics defined in T280602.
  • In the === Path forward section below, the Performance-Team team documents whether the change(s) in site performance metrics are acceptable for:

Site performance

Path forward

Done

  • The ===Requirements above are met

Event Timeline

Krinkle added a subscriber: Krinkle.

Perfomance team and DBAs will monitor, and evaluates the outcome, based primarily on the following:

  • "Parser cache hit ratio", has been stable around ~80% for article page views. Measured via Grafana: Parser Cache (contenttype; wikitext)

As we are proposing applying a shorter expiry time to talk pages, our analysis should probably evaluate talk pages separately (if that is not already planned) to ensure there is no significant regression.

META
@Krinkle: based on the steps outlined in T280606#7058580, I'm assigning this task over to you.

Of course, if you think there is someone better to assign this task to, please adjust the task accordingly.

As we are proposing applying a shorter expiry time to talk pages, our analysis should probably evaluate talk pages separately (if that is not already planned) to ensure there is no significant regression.

Agreed. We need new instrumentations for that, however, and their success is I suppose up to you (plural) to decide over. But some ideas:

  • Add "discussion page" as pseudo content-type for parsercache metrics (e.g. distinct from wikitext). This doesn't need to be perfect given its an aggregate (e.g. we can do wikitext-in-talk-namespace, or perhaps include other discussion pages based on the "newsection" heuristic etc.). That will give you a dedicated cache-hit ratio to monitor.
  • Page view load time (as proxy for how impactful a reduced hit rate actually is) for the talk namespace of real users. Our RUM navtiming data includes a namespace factor, however we don't load these into Graphite that way currently, but we can show you and/or help you with doing these as one-off queries in Hadoop.
  • @dpifke and @Peter will also be looking into adding talk pages to our continous synthetic/lab speed tests. These are unlikely to help us for this particular mitigation since these are not going to beneefit parser caching much, but these will help with other work in your team in the future, e.g with regards to cost and impact of added HTML, CSS, and JavaScript payloads.

Sounds good. Do you need any more input from the Editing team at this point?