Page MenuHomePhabricator

Make RB ?redirect=false cache-efficient
Closed, ResolvedPublic0 Estimated Story Points

Description

From https://phabricator.wikimedia.org/T118548#2201143 and on:

  1. On the vcl_recv side, for /api/rest_v1 requests, strip and note (via req.http.Foo as an internal request flag) the ?redirect=no argument. We'll need to support that in all its forms without screwing up other arguments - we've done that before elsewhere in VCL. I think the best example is in templates/varnish/analytics.inc.vcl.erb, in function analytics_provenance_recv_ where we strip the wprov argument from any position in the argument list by using a pair of regexen.
  2. On the vcl_deliver side, if the flag from above is set && status == 302, we'll overwrite the status as "200 OK" and delete the Location header.

I guess the question remains, though: if the Varnish redirect-stripper sees ?redirect with a non-falsey value, should it still strip the parameter?

Yes. We care only about the false-y case.

Event Timeline

BBlack created this task.May 5 2016, 4:11 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 5 2016, 4:11 AM

Change 287104 had a related patch set uploaded (by BBlack):
Text VCL: RB ?redirect=false optimization

https://gerrit.wikimedia.org/r/287104

BBlack added a comment.May 6 2016, 2:10 PM

So, from https://gerrit.wikimedia.org/r/287104 code review comments: apparently there's a second case here not discussed in T118548 ? "Title redirects", where RB ignores the ?redirect argument and always does a 301? It seems strange we'd be having Varnish implement something that RB doesn't implement here, as this should be just a cache optimization and the ultimate behavior should be the same as direct internal access to restbase.svc. I would've expected if this is a case we need to cover, step 1 is implementing ?redirect=false for this case in RB, and then step 2 is upgrading the varnish logic to match either type of redirect (301+302) rather than just 302?

mobrovac added a comment.EditedMay 6 2016, 2:20 PM

So, from https://gerrit.wikimedia.org/r/287104 code review comments: apparently there's a second case here not discussed in T118548 ? "Title redirects", where RB ignores the ?redirect argument and always does a 301?

Title normalisation in MW has proven to be rather tricky. So much so that we have a separate library for it, used by both RESTBase and Parsoid. It's not just about urlencode and friends, we also have to take into account namespace name localisations, noun genders and all other sorts of wonders present in natural languages, so I wouldn't consider this to be in the optimisation category.

It seems strange we'd be having Varnish implement something that RB doesn't implement here, as this should be just a cache optimization and the ultimate behavior should be the same as direct internal access to restbase.svc. I would've expected if this is a case we need to cover, step 1 is implementing ?redirect=false for this case in RB, and then step 2 is upgrading the varnish logic to match either type of redirect (301+302) rather than just 302?

I may be misunderstand what you are saying, but whichever way you look at it, we need two distinct cases - one for 301s and the other for 302s.

GWicke added a comment.EditedMay 6 2016, 6:32 PM

301s (returned for title normalization) are always safe to cache (and we do send the corresponding headers). Those redirects are applied unconditionally to non-canonical requests, and we don't support suppressing those. Clients like VE generally send requests using the canonical title format & with underscores, so don't encounter these redirects.

The ?redirect=false parameter & this task on the other hand is about following wikitext redirects, which are signaled as 302. Our goal is to use the cached 302 response to also serve ?redirect=false requests, and I think the current patch will do exactly that.

BBlack added a comment.May 9 2016, 3:36 PM

Patch updated to include matching redirect=0 (and some comments made clearer, and the deliver-time code was re-arranged to be slightly more efficient).

Change 287104 merged by BBlack:
Text VCL: RB ?redirect=false optimization

https://gerrit.wikimedia.org/r/287104

301s (returned for title normalization) are always safe to cache (and we do send the corresponding headers). Those redirects are applied unconditionally to non-canonical requests, and we don't support suppressing those. Clients like VE generally send requests using the canonical title format & with underscores, so don't encounter these redirects.

This is not about suppressing 301s. It's about where they point to (the location header). If you curl for https://en.wikipedia.org/api/rest_v1/page/html/List%20of%20synonyms?redirect=false you get:

location: List_of_synonyms?redirect=false

which is correct. However, Gerrit 287104 strips off ?redirect=false and doesn't put it back on 301s, which means that if you have a page P_A that redirects to P_B and the client issues a request for P%20A?redirect=false that flag will be ignored while it is following the first redirect:

  • P%20A?redirect=false -> 301 && location: P_A
  • P_A -> 302 && location: P_B

With ?redirect=false the second redirect should never happen, i.e. Varnish shouldn't disregard the user's wish not to follow MW redirects (but title-normalisation ones do need to be followed, no question about it).

@mobrovac, looking back at your earlier comment I now realize that I misunderstood your concern. The example fix you gave there actually pointed in this direction.

I agree that we should preserve the parameter for 301s (or 3xx except 302 in general). Let's create a follow-up to do so.

BBlack added a comment.May 9 2016, 4:17 PM

Well, we chose to strip it (or not) before sending the request to RB, before we've seen the response. So we can't logically make the call to strip the parameter or not based on the response code that happens afterwards.

Well, we chose to strip it (or not) before sending the request to RB, before we've seen the response. So we can't logically make the call to strip the parameter or not based on the response code that happens afterwards.

Sure, but we can check a posteriori for resp.status == 301 && req.http.X-RB-NOREDIR and append redirect=false to resp.http.Location

BBlack added a comment.May 9 2016, 4:21 PM

So, just to be sure I understand: the problem here is that when both title and wikitext redirects apply to a single request, with ?redirect=false RB's behavior would do the wikitext-redirect as part of the 301? (would respond to P%20A?redirect=false with a 301 to P_B)?

BBlack added a comment.May 9 2016, 4:26 PM

Sorry, that comment was written before I read @mobrovac's. So the issue is just ensuring ?redirect=false is preserved in the 301's Location?

mobrovac added a comment.EditedMay 9 2016, 4:28 PM

So, just to be sure I understand: the problem here is that when both title and wikitext redirects apply to a single request, with ?redirect=false RB's behavior would do the wikitext-redirect as part of the 301? (would respond to P%20A?redirect=false with a 301 to P_B)?

No, these are separate in RB's view. RB first looks for title normalisation. If the title is not normalised, it responds with a 301 no matter what. The thing that differs is the location it points to: if redirect=false is specified, the location header contains it as well, but it still points to the original URI, not the redirect. Then, in the next iteration, it responds with either a 302 or a 200 depending on the redirect flag. To exemplify it, suppose P_A is a wikitext redirect to P_B. Here are the relevant req/resp:pairs:

  • for P_A: 302 && Location: P_B
  • for P_A?redirect=false: 200
  • for P%20A: 301 && Location: P_A (not P_B!)
  • for P%20A?redirect=false: 301 && Location: P_A?redirect=false

Hope this helps clarifying the problem.

Sorry, that comment was written before I read @mobrovac's. So the issue is just ensuring ?redirect=false is preserved in the 301's Location?

Yup, exactly.

Change 287659 had a related patch set uploaded (by BBlack):
RB: fix redirect=false support for 301 title redirects

https://gerrit.wikimedia.org/r/287659

Change 287659 merged by BBlack:
RB: fix redirect=false support for 301 title redirects

https://gerrit.wikimedia.org/r/287659

GWicke closed this task as Resolved.May 9 2016, 7:13 PM
GWicke claimed this task.

Thank you, @BBlack!

Jdforrester-WMF set the point value for this task to 0.May 10 2016, 1:34 AM