Page MenuHomePhabricator

Only retry failed requests for external traffic on cache frontends
Open, MediumPublic

Description

The VCL code retrying 503 errors once on frontend instances to paper over transient issues does so at most once to avoid multiplying failing requests. However, we now also have a number of known cases where internal services are making sub-requests back into the frontend caches, which could still cause a multiplication from our retry-503-once code.

We should mitigate that impact by putting a conditional around the retry-503-once code, so that it doesn't happen when X-Client-IP is in private/WMF address space.

Event Timeline

ema created this task.Apr 3 2020, 10:15 AM
Restricted Application added a project: Operations. · View Herald TranscriptApr 3 2020, 10:15 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Medium priority.Apr 3 2020, 10:15 AM
ema moved this task from Triage to Caching on the Traffic board.Apr 3 2020, 10:37 AM
ema added a project: good first task.

@ema Hello! As this task is tagged as a good first task, I'm wondering if it can be made clear where exactly the code needs to be changed. Should it be here https://github.com/wikimedia/puppet/blob/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb#L837?

CDanis added a subscriber: CDanis.Apr 6 2020, 8:52 PM
ema added a comment.Apr 7 2020, 5:35 AM

@ema Hello! As this task is tagged as a good first task, I'm wondering if it can be made clear where exactly the code needs to be changed. Should it be here https://github.com/wikimedia/puppet/blob/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb#L837?

That's correct @srishakatux. Plus, a similar change is also needed in vcl_backend_error. Something along these lines can be used to find out if the client IP is WMF:

std.ip(req.http.X-Client-IP, "192.0.2.1") ~ wikimedia_nets

Good luck!