Page MenuHomePhabricator

(Some) URLs containing URL-encoded characters(?) are uncacheable
Closed, InvalidPublic

Description

$ curl -H 'Host: en.wikipedia.org' -v 'http://appservers.svc.eqiad.wmnet/wiki/Touch%C3%A9_%28disambiguation%29' 2>&1 >/dev/null |egrep '(HTTP|Cache-control)'
> GET /wiki/Touch%C3%A9_%28disambiguation%29 HTTP/1.1
< HTTP/1.1 200 OK
< Cache-control: private, must-revalidate, max-age=0

$ curl -H 'Host: en.wikipedia.org' -v 'http://appservers.svc.eqiad.wmnet/wiki/Touch_%28disambiguation%29' 2>&1 >/dev/null |egrep '(HTTP|Cache-control)'
> GET /wiki/Touch_%28disambiguation%29 HTTP/1.1
< HTTP/1.1 200 OK
< Cache-control: private, must-revalidate, max-age=0

but

$ curl -H 'Host: en.wikipedia.org' -v 'http://appservers.svc.eqiad.wmnet/wiki/Touch%C3%A9_(disambiguation)' 2>&1 >/dev/null |egrep '(HTTP|Cache-control)'
> GET /wiki/Touch%C3%A9_(disambiguation) HTTP/1.1
< HTTP/1.1 200 OK
< Cache-control: s-maxage=2678400, must-revalidate, max-age=0

$ curl -H 'Host: en.wikipedia.org' -v 'http://appservers.svc.eqiad.wmnet/wiki/Touch_(disambiguation)' 2>&1 >/dev/null |egrep '(HTTP|Cache-control)'
> GET /wiki/Touch_(disambiguation) HTTP/1.1
< HTTP/1.1 200 OK
< Cache-control: s-maxage=2678400, must-revalidate, max-age=0

And if you think this is related to parentheses specifically:

$ curl -H 'Host: el.wikipedia.org' -v 'http://appservers.svc.eqiad.wmnet/wiki/%CE%A0%CF%8D%CE%BB%CE%B7:%CE%9A%CF%8D%CF%81%CE%B9%CE%B1' 2>&1 >/dev/null |egrep '(HTTP|Cache-control)'
> GET /wiki/%CE%A0%CF%8D%CE%BB%CE%B7:%CE%9A%CF%8D%CF%81%CE%B9%CE%B1 HTTP/1.1
< HTTP/1.1 200 OK
< Cache-control: s-maxage=2678400, must-revalidate, max-age=0

$ curl -H 'Host: el.wikipedia.org' -v 'http://appservers.svc.eqiad.wmnet/wiki/Πύλη:Κύρια' 2>&1 >/dev/null |egrep '(HTTP|Cache-control)'
> GET /wiki/Πύλη:Κύρια HTTP/1.1
< HTTP/1.1 200 OK
< Cache-control: private, must-revalidate, max-age=0

Possibly related to T89673?

Event Timeline

faidon raised the priority of this task from to High.
faidon updated the task description. (Show Details)
faidon added a project: MediaWiki-General.
faidon added subscribers: faidon, Catrope.

The fix for T29935 should have made it so that Varnish normalizes URLs and never sends requests for paths containing %28 to the appservers. Did that break somehow?

faidon claimed this task.

That was a direct hit to appservers.

I'm guessing MW emitting a CC: private is a safeguard in case Varnish is not normalizing properly (which was actually happening, for mobile), to avoid caching something that wouldn't be possible to purge. I think there's no bug to be fixed here :) Thanks @Catrope!