Page MenuHomePhabricator

ULS causes pages to be cached with random user language
Closed, DeclinedPublic

Description

When I visit wikidata.org (with ULS) for the first time (e.g. in a new browser, no jstorage / cookie stuff), it shows me the sidebar etc. in Dutch.

When I click "Help" or something, it then switches to English for me.

Lydia (from wikidata) also had this issue.

We are in Germany so I wonder if it's picking up the caching server ESAMS IP address and using that somehow to guess our language?


Version: master
Severity: normal

Details

Reference
bz41451

Event Timeline

bzimport raised the priority of this task from to High.
bzimport set Reference to bz41451.
bzimport added a subscriber: Unknown Object (MLST).
aude created this task.Oct 27 2012, 8:53 AM
aude added a comment.Oct 27 2012, 8:55 AM

This happened for me with Firefox, and not with Chrome.

I'm normally using Chrome and probably already have cookies and local storage stuff set on my browser, but it was a fresh visit to wikidata.org with Firefox.

Caching has not been configured properly to vary by language. ULS is functioning correctly.

aude added a comment.Oct 27 2012, 9:12 AM

okay, that makes sense. I'm not sure what bugzilla component to pick but if you want to change it, that's okay with me.

aude added a comment.Oct 27 2012, 9:33 AM

clicking around, i got one page in Norwegian and community portal in icelandic.

and it's showing "common languages"

English
norsk (bokmål)
íslenska

and some missing system messages like uls-search-placeholder

looks like some more setting up is needed of wikidata and configuring stuff.

aude added a comment.Oct 27 2012, 9:37 AM

and spanish (and icelandic) now in chrome.

Learn Norwegian! Problem solved.

aude added a comment.Oct 27 2012, 9:15 PM

Reedy removed $wgULSGeoService = false to see if that mattered. It seems to improve things but not completely fix.

Special:Recentchanges seems to be especially unsticky (the setlang param) in both Firefox and Chrome.

In Chrome, I switched my lang to Arabic, clicked around, then back to English. The recent changes page is stuck in Arabic.

In Firefox, Recent changes is still stuck in Norwegian but the other pages seem to be behaving better.

I have tried proxying my browsers via the US, bypassing ESAMS, and disabling browser cache, deleting cookies, local storage, etc.

Does ULS store information in the browsers local storage or session storage?

The language selection is done on PHP side, but Wikidata currently has normal caching which does not vary per accept language, and thus the pages are served in the language they were cached in.

This issue should be taken up with ops. There is no issue with component ULS. See comment 9.

mark added a comment.Nov 1 2012, 2:56 PM

Squid simply follows the instructions it's getting from MediaWiki in that respect. Send Vary / X-Vary-Options headers as appropriate, and then it should work.

(In reply to comment #11)

Squid simply follows the instructions it's getting from MediaWiki in that
respect. Send Vary / X-Vary-Options headers as appropriate, and then it should
work.

How should this be implemented then so that anonymous users get served the UI that their accept language or the cookie that is set when the change the language away from the accept language? Code samples will probably suffice.

Setting to high priority. I think we need to figure this one out if we're going to see ULS deployed much more widely than it is now. This is already causing problems on wikidata.org.

daniel added a comment.Nov 6 2012, 7:41 AM

(In reply to comment #11)

Squid simply follows the instructions it's getting from MediaWiki in that
respect. Send Vary / X-Vary-Options headers as appropriate, and then it should
work.

After a brief look, seems to me that there is no mechanism in Squid that allows to split the cache based on the value of a specific cookie. Am I missing something obvious? I just don't see how Vary and X-Vary-Options can be used to do this. Maybe instead of varying on cookies, we should vary on an ETag? Anyway, here's what I found so far:

If we use Vary: Cookie, we will vary on *all* cookies. That would fracture the cache beyond repair. So, maybe X-Vary-Options can help?

I dug up Tim's original mail containing the patch that introduces X-Vary-Options into Squid: http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html. There seems to be very li8ttle documentation beyond that. From what I see, the best we can do with that is something like:

X-Vary-Options: Cookie; string-contains=language;

But that would just split the cache into two: requests with the "language" cookie set, and those without the language cookie set. It does not very on the value of the cookie. Also - is it still true that X-Vary-Options is a custom patch, and never got into the Squids main line? if not, please fix http://wiki.squid-cache.org/Features/XvaryOptions :)

Anyway - what we would actually need is something like

X-Vary-Cookie: language

...that would vary on the value of the language cookie. Or maybe X-Vary-Options could be extended to cover this:

X-Vary-Options: Cookie; string-extract=language=([^;]*);

But this is messy, and I don't think anyone is up to patching Squid further. So... other options? Could we vary on the ETAg header, and generate it depending on the user language?

PS: I'm also a bit surprised that this issue comes at a surprise. Allowing anons to set the interface language is requested about once a year, and this problem is the reason it got shot down time and time again. With regards to Wikidata, I discussed this problem with several staffers at the Berlin hackathon and at Wikimania. Essentially I was told "wait for ULS, they'll take care of that stuff". Hm.

daniel added a comment.Nov 6 2012, 7:46 AM

Hm, now that I submitted the above, it occurred to me that I perhaps *was* missing the obvious... how about this:

Vary: Content-Language

Simple enough, no? At least at wikidata.org, Content-Language *should* be the user language. Or we simply introduce X-MW-User-Language and use Vary: X-MW-User-Language...

(A quick check shows that Content-Language is returning "en" on wikidata.org. I'll fix that).

(In reply to comment #12)

How should this be implemented then so that anonymous users get served the UI
that their accept language or the cookie that is set when the change the
language away from the accept language? Code samples will probably suffice.

Hi Siebrand, are you asking for help with setting the Vary header generally, or are you asking about whether the Vary header can be set conditionally based on whether the cookie is set? If you're asking about the former, I think OutputPage->addAcceptLanguage() and the GetCacheVaryCookies hook is where it's at. If you're asking about the latter, that seems like another wrinkle in all of this; I'm not sure what the logic should look like; I think maybe the best thing from a caching perspective may be to set the cookie based on the Accept-Language header, and then only Vary on the cookie.

For limited deployments of ULS, varying the cache on ULS cookies and/or Accept-Language headers should be fine. Where the concern is is if/when we deploy this widely over all wikis, varying all pages on all languages. That could split the cache pretty badly.

I suppose we can fix this now for wikidata.org, and then worry about the larger cache issue once ULS is closer to going into wide deployment.

daniel added a comment.Nov 6 2012, 7:58 AM

Bug filed for setting Content-Language to the user language at least for item pages on wikidata: bug 41806

daniel added a comment.Nov 6 2012, 8:03 AM

(In reply to comment #16)

For limited deployments of ULS, varying the cache on ULS cookies

That sounds easy, but as I said above, I have not found a way to vary on the value of a specific cookie. I would have expected this to be a common use case, but apparently, it isn't.

and/or Accept-Language headers should be fine.

That would be wrong. Which languages my browser is set to accept has nothing to do with which language I picked in ULS. Besides, Accept-Language headers exist in thousands of combinations of languages, priority values, whitespace, etc.

But this makes me wonder about something else... is Vary applied to the request or the response headers, or both? If it's just the request headers, things just got a lot harder, and my idea to use the Content-Language header doesn't work.

daniel added a comment.Nov 6 2012, 8:30 AM

To answer my own question: of course Squid can only vary on request headers.

So, Content-Language and ETag are out of the question. Since Accept-Language also doesn't do what we need, we are back to square one: vary on the value of one specific cookie. Is squid really not able to do this? Looks like it would be hackish, but possible, with varnish: https://www.varnish-cache.org/trac/wiki/VCLExampleRemovingSomeCookies.

The only alternative I see is to actually use different URLs, injecting setlang parameters into all local links, like the StickToThatLanguage extension does. That's a nasty hack, but will work with squid caches - which is why we developed it that way.

daniel added a comment.Nov 6 2012, 8:55 AM

Oh, to add one more problem: purging. When a page changes, all variants (languages) of that page need to be purged. I don't think we currently have a mechanism for this at all.

(In reply to comment #18)

> and/or Accept-Language headers should be fine.

That would be wrong. Which languages my browser is set to accept has nothing to
do with which language I picked in ULS. Besides, Accept-Language headers exist
in thousands of combinations of languages, priority values, whitespace, etc.

Before you pick anything in ULS, your initial language is based on the Accept-Language header.

Aren't we using Varnish already? Snippet from wikidata.org:
Server:Apache
Vary:Accept-Encoding
Via:1.1 varnish
Via:1.1 varnish
X-Varnish:1326947601
X-Varnish:3373410970
X-Vary-Options:Accept-Encoding;list-contains=gzip

Have to start from somewhere, so I added the Vary headers as Mark suggested in comment #11: https://gerrit.wikimedia.org/r/32030

I believe we can start with this and make it more granular as needed.

(In reply to comment #22)

Have to start from somewhere, so I added the Vary headers as Mark suggested in
comment #11: https://gerrit.wikimedia.org/r/32030

I believe we can start with this and make it more granular as needed.

As I said in my comment there: this would explode the cache, varying on every possible combination of things in the Cookie and Accept-Language headers. If we can't do better than that, just send "Cache-Control: no-cache, must-revalidate".

mark added a comment.Nov 6 2012, 2:11 PM

(In reply to comment #20)

Oh, to add one more problem: purging. When a page changes, all variants
(languages) of that page need to be purged. I don't think we currently have a
mechanism for this at all.

All variants of a URL are purged by Squid and Varnish, so that in itself is not a problem.

mark added a comment.Nov 6 2012, 2:18 PM

(In reply to comment #14)

I dug up Tim's original mail containing the patch that introduces
X-Vary-Options into Squid:
http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html. There
seems to be very li8ttle documentation beyond that. From what I see, the best
we can do with that is something like:

X-Vary-Options: Cookie; string-contains=language;

But that would just split the cache into two: requests with the "language"
cookie set, and those without the language cookie set. It does not very on the
value of the cookie. Also - is it still true that X-Vary-Options is a custom
patch, and never got into the Squids main line? if not, please fix
http://wiki.squid-cache.org/Features/XvaryOptions :)

I believe it did enter Squid main line. But that doesn't help much; other caches (like Varnish) don't have it.

Anyway - what we would actually need is something like

X-Vary-Cookie: language

...that would vary on the value of the language cookie. Or maybe X-Vary-Options
could be extended to cover this:

X-Vary-Options: Cookie; string-extract=language=([^;]*);

But this is messy, and I don't think anyone is up to patching Squid further.
So... other options? Could we vary on the ETAg header, and generate it
depending on the user language?

No, since the client doesn't know the E-Tag header of course, and isn't sending it.

PS: I'm also a bit surprised that this issue comes at a surprise. Allowing
anons to set the interface language is requested about once a year, and this
problem is the reason it got shot down time and time again. With regards to
Wikidata, I discussed this problem with several staffers at the Berlin
hackathon and at Wikimania. Essentially I was told "wait for ULS, they'll take
care of that stuff". Hm.

I am too. It's not exactly a new problem.

Of course exploding the cache is not a big problem for a small wiki like wikidata, but we can't even think about deploying this on any larger wikis until we've solved this properly...

mark added a comment.Nov 6 2012, 2:20 PM

So eventually we'll move the "text cluster" to Varnish as well, which would make this problem slightly easier as we can influence request headers in VCL. ULS would need to be adapted for efficient use of that. But the migration of the Text caching cluster to Varnish is at least another 6 months out still.

daniel added a comment.Nov 6 2012, 3:30 PM

@mark: so how about ULS just send out "Cache-Control: no-cache" for now? That would cause all pages to just bypass the proxies, right? Better than a vary on cookie and accept-language, and probably acceptable for Wikidata, where we expect few anon visitors.

mark added a comment.Nov 6 2012, 4:03 PM

I think neither disabling all caching, nor fragmenting the cache are acceptable solutions for any wiki getting more than a tiny amount of traffic. But if I absolutely had to pick one, it would be fragmenting the cache, along with a lowish (5 mins?) cache ttl. Because that would still protect the backend infrastructure a bit in case of spikes/slashdotting/etc. But again, we really need a better solution than either of those two. If not, we'd probably be forced to turn ULS off entirely when hitting problems/spikes.

daniel added a comment.Nov 6 2012, 4:09 PM

For the record: I'm told that with varnish, we could vary on the value of a given cookie. That would solve the problem. But migration to varnish is at least 6 months away.

I think ULS should have an option for working with that kind of varnish setup efficiently.

ori added a comment.Nov 7 2012, 6:19 AM

(In reply to comment #29)

For the record: I'm told that with varnish, we could vary on the value of a
given cookie. That would solve the problem. But migration to varnish is at
least 6 months away.

I think ULS should have an option for working with that kind of varnish setup
efficiently.

Copy-pasting what I said on wikitech-l:

I don't know about Squid, but there are all manner of ways you could attack
this problem with Varnish. Overriding vcl_hash lets you customize how a
cache key is constructed from a request. It's usually just hostname + URL,
but you can add any string to the hash:

sub vcl_hash {
    if (req.http.Cookie ~ "language") {
        hash_data(regsub(req.http.Cookie, "^.*(language=[^;]+).*$", "\1"));
    }
}
mark added a comment.Nov 7 2012, 11:44 AM

(In reply to comment #30)

Copy-pasting what I said on wikitech-l:

> I don't know about Squid, but there are all manner of ways you could attack
> this problem with Varnish. Overriding vcl_hash lets you customize how a
> cache key is constructed from a request. It's usually just hostname + URL,
> but you can add any string to the hash:

sub vcl_hash {
    if (req.http.Cookie ~ "language") {
        hash_data(regsub(req.http.Cookie, "^.*(language=[^;]+).*$", "\1"));
    }
}

You've just broken purging.

(In reply to comment #31)

(In reply to comment #30)

> Copy-pasting what I said on wikitech-l:
>
> > I don't know about Squid, but there are all manner of ways you could attack
> > this problem with Varnish. Overriding vcl_hash lets you customize how a
> > cache key is constructed from a request. It's usually just hostname + URL,
> > but you can add any string to the hash:
>
>
> sub vcl_hash {
> if (req.http.Cookie ~ "language") {
> hash_data(regsub(req.http.Cookie, "^.*(language=[^;]+).*$", "\1"));
> }
> }

You've just broken purging.

Mark, you being the expert here, at least to me knowledge, can you please help and work towards a solution? From what I can see, your expertise hasn't been used yet on this issue, except for pointing out what's wrong with proposed solutions.

Going by what I heard from Mark, Tim Starling and others, the situation seems to be like this:

  • there is no way to do this with the current Squid caches. Varying con Accept-Language and Cookie got vetoed by Tim.
  • Bypassing Squids would make it work, but opens a DoS vector. Mark does *not* like it.
  • there are probably ways to do this with the new Varnish caches.
  • Migration to Varnish is at least 6 months away.
  • ULS can still be used as a convenient way to switch UI language for logged in users. At least for Wikidata, that would still be helpful.

I'd be happy if someone could tell me this assessment is wrong...

mark added a comment.Nov 9 2012, 11:20 AM

(In reply to comment #33)

Going by what I heard from Mark, Tim Starling and others, the situation seems
to be like this:

  • there is no way to do this with the current Squid caches. Varying con Accept-Language and Cookie got vetoed by Tim.
  • Bypassing Squids would make it work, but opens a DoS vector. Mark does *not* like it.
  • there are probably ways to do this with the new Varnish caches.
  • Migration to Varnish is at least 6 months away.
  • ULS can still be used as a convenient way to switch UI language for logged in users. At least for Wikidata, that would still be helpful.

Correct. There isn't really any solution right now.

I think ULS should only be enabled for logged in users until we have Varnish in place.

I think I'm now confused about X-Vary-Options: http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html

Does it mean things are cached with those test results appended as key or things are not cached at all if any test "whether the XXX header contains the string YYY" succeed?

(In reply to comment #33)

Going by what I heard from Mark, Tim Starling and others, the situation seems
to be like this:

  • Bypassing Squids would make it work, but opens a DoS vector. Mark does *not* like it.
  • there are probably ways to do this with the new Varnish caches.
  • Migration to Varnish is at least 6 months away.

I'm confused. See below.

(In reply to comment #21)

Aren't we using Varnish already? Snippet from wikidata.org:
Server:Apache
Vary:Accept-Encoding
Via:1.1 varnish
Via:1.1 varnish
X-Varnish:1326947601
X-Varnish:3373410970
X-Vary-Options:Accept-Encoding;list-contains=gzip

This remained unanswered. So I'm asking again: Aren't we using Varnish already?

mark added a comment.Nov 9 2012, 1:04 PM

(In reply to comment #36)

I'm confused. See below.

(In reply to comment #21)
> Aren't we using Varnish already? Snippet from wikidata.org:
> Server:Apache
> Vary:Accept-Encoding
> Via:1.1 varnish
> Via:1.1 varnish
> X-Varnish:1326947601
> X-Varnish:3373410970
> X-Vary-Options:Accept-Encoding;list-contains=gzip

This remained unanswered. So I'm asking again: Aren't we using Varnish already?

No we're not. Elsewhere, but not on the text cluster, which is what's relevant here. I'm not sure where the snippet above comes from, but not from the text caching cluster.

aude added a comment.Nov 9 2012, 3:07 PM

I assume the varnish is coming from the bits, which includes images and also resource loader stuff (e.g. javascript).

I think what matters for the use-case here is the actual html page, which still comes from squid.

faidon added a comment.Nov 9 2012, 4:14 PM

As I've noted elsewhere, do note that there are intermediate (forward) caches around in the world, and you have to take those into account too when setting HTTP cache headers.

Please help me better understand our options here.

My understanding is that in the near term the ULS folks are only deploying ULS to small wikis (Wikidata is probably one of the biggest ones). So would disabling or splitting the cache if the ULS cookie is present be a) acceptable, and b) feasible using one of the available methods, e.g. X-Vary-Options?

(Does ULS have to set a cookie if the user does not change the language? It seems to do so regardless of whether I change the language or not right now.)

If that's feasible, then I would suggest going for that path, limiting ULS deployments for logged out users to small wikis until the Varnish migration, and working with someone in ops until then to explore the best option to scale caching of UI language variants of the same page properly via Varnish. Does that make sense?

(In reply to comment #40)

(Does ULS have to set a cookie if the user does not change the language? It
seems to do so regardless of whether I change the language or not right now.)

We can change it to not add* cookie if the user interface language === wiki content language.

  • and remove if already existing

preilly wrote:

I'm going to talk to Siebrand Mazeland about this issue today and see if we can figure something out short-term.

— Patrick

My understanding is that in the near term the ULS folks are only deploying ULS
to small wikis (Wikidata is probably one of the biggest ones). So would
disabling or splitting the cache if the ULS cookie is present be a) acceptable,
and b) feasible using one of the available methods, e.g. X-Vary-Options?

I suppose Mark and Tim are the authorities on the subject, but let me reiterate my understanding:

ad a) Splitting the cache by language in the ULS cookie is actually not the issue - it would rather be a solution to the issue. As far as I understand, it would be fine at least for small wikis.

ad b) It's not possible with Squid (but probably is with Varnish).

We could vary on the entire cookie header - but that would be unique per client, not splitting but exploding the cache and making it useless. Or we could use XVO, but that only lets us vary on the *presence* of a cookie - all anons with the ULS cookie set (no matter to which value) would hit the same cached version, which would not improve the situation at all.

Or we could hack squid to make this possible. I don't know how complex this is, or who could do it, or how long it would take to roll this out.

There's an option c): use different URLs for different language versions of each page. There are two problems with this: 1) whenever the page changes, *all* the (potential) URLs have to be purged explicitly, increasing the number of purged by two orders of magnitude (and we'd need to hack core to do it). And we need to rewrite *all* links in the interface to the language specific version (needs lots of changes in core and messes with internal caching).

The StickToThatLanguage extension uses URL parameters and rewrites the links using JavaScript. It does not work without JS. Because it uses the uselang=xx parameter, it bypasses all caches. Maybe squid can be made to vary on the uselang parameter, but then we again have the purging problem.

Even though, STTL might actually be the best option we have right now.

(In reply to comment #43)

Or we
could use XVO, but that only lets us vary on the *presence* of a cookie - all
anons with the ULS cookie set (no matter to which value) would hit the same
cached version, which would not improve the situation at all.

IIRC the language converter varies on the presence of every supported variant code in Accept-Language. Maybe we can vary on the presence of "mw_uls_<langcode>" in cookies here (however the list will be much longer than the language converter one)?

mark added a comment.Nov 12 2012, 11:21 AM

(In reply to comment #40)

Please help me better understand our options here.

My understanding is that in the near term the ULS folks are only deploying ULS
to small wikis (Wikidata is probably one of the biggest ones). So would
disabling or splitting the cache if the ULS cookie is present be a) acceptable,
and b) feasible using one of the available methods, e.g. X-Vary-Options?

(Does ULS have to set a cookie if the user does not change the language? It
seems to do so regardless of whether I change the language or not right now.)

If that's feasible, then I would suggest going for that path, limiting ULS
deployments for logged out users to small wikis until the Varnish migration,
and working with someone in ops until then to explore the best option to scale
caching of UI language variants of the same page properly via Varnish. Does
that make sense?

That seems reasonable, yes. It takes care of the "slashdot" problem, since first time visitors don't have a cookie set and get the cached default language page. When there's a cookie set, we better not cache it at all since the cache hit rate would be extremely low anyways, and it would just inflate the cache.

As Faidon indicates, we'll have to send Vary: cookie headers anyway, for other caching proxies out there. That will destroy their cache hit rate on this as well, but there's not much we can do about that.

mark added a comment.Nov 12 2012, 11:33 AM

(In reply to comment #45)

As Faidon indicates, we'll have to send Vary: cookie headers anyway, for other
caching proxies out there. That will destroy their cache hit rate on this as
well, but there's not much we can do about that.

Actually, they have to revalidate every time anyway, so that doesn't matter. :)

(In reply to comment #45)

That seems reasonable, yes. It takes care of the "slashdot" problem, since
first time visitors don't have a cookie set and get the cached default language
page. When there's a cookie set, we better not cache it at all since the cache
hit rate would be extremely low anyways, and it would just inflate the cache.

So how would this work in practice?

Scenario A:

  1. Client requests https://wikidata.org/ _without_ ULS cookies present and without being logged in.
  2. Say we get a cache MISS. Page is returned with XVO including the ULS cookie name, and with Vary: cookie for other caches not supporting XVO.
  3. Page is now cached server-side.
  4. ULS is loaded client-side. It does not set any cookies because the user does not change her default language.
  5. User continues to browse as normal.
  6. User gets cache HITs or MISSes as normal in the default language.

Scenario B:

  1. Client requests https://wikidata.org/ with ULS cookie present due to a previous language change via ULS.
  2. We get a cache MISS because page hasn't been previously cached in variant with ULS cookie present.
  3. MediaWiki checks for ULS cookie and sends "Cache-Control: no-cache, no-store, must-revalidate" header alongside the requested page.
  4. Squid therefore does not cache the page.
  5. The same is true for subsequent pageviews, including pageviews by other users with the ULS cookie present. The user now consistently gets cache MISSes, including from intermediate caches.

Would this approach more or less work for small wikis or am I fundamentally misunderstanding something?

It seems to be forgotten in this discussion that ULS will also set the default language for anon users based on the accept-language header even before they select any language explicitly. That feature would not work with the solution described in comment #47.

Niklas, that feature could be disabled for now. If I'm understanding things correctly, implementing that feature reasonably well would require selection of cached language copies of a page based on the contents of the ULS cookie, and scalable implementation of purging across all cached copies.

Is the description in comment 47 correct/workable? Is it preferable to a URL-based approach?

If the answers are yes and yes, I suggest we iterate, and once we've got that solution implemented and deployed on small wikis that use ULS, focus on what the desired behavior will be in the glorious Varnish future.

(In reply to comment #49)

If I'm understanding things
correctly, implementing that feature reasonably well would require selection of
cached language copies of a page based on the contents of the ULS cookie

To clarify, the Accept-Language feature would require selecting the correct variant from the cache based on Accept-Language, _and_ overriding that choice with the variant specified by the ULS cookie if set. Again, please limit the feature set to something that's feasible now and iterate from there.

ori added a comment.Nov 13 2012, 11:55 PM

Created attachment 11354
Patch for Squid to add cookie headers to output sent to storeurl_rewrite_program

attachment store_rewrite_c.patch ignored as obsolete

ori added a comment.Nov 13 2012, 11:56 PM

Created attachment 11355
storeurl_rewrite_program in Python that adds value of ULS cookie as query param to URL.

Attached:

ori added a comment.Nov 13 2012, 11:57 PM

I wrote a patch for Squid that I hope would fix this issue.

Squid 2.x has a 'storeurl_rewrite_program' directive, which specifies an external program that Squid calls to rewrite / canonicalize URLs prior to performing cache operations:

http://www.squid-cache.org/Doc/config/storeurl_rewrite_program/

The rewriter is a simple program that reads a request log on standard input and writes URLs to standard output. The format of the request log that Squid sends the rewriter does not contain the cookie headers, but adding them requires the addition of just one line to store_rewrite.c. I've attached a patch made against the current stable tag (SQUID_2_7).

I wrote a simple rewriter in Python that checks for the presence of a 'ULS' cookie and adds it to the URL as an additional query parameter (also attached).

To state the obvious: making even a small change to Squid is a big deal, so this would need to be reviewed very carefully by ops to make sure it is correct. The effort required may or may not be worth it, depending on the practicality of other available workarounds. But I will note that there may be an additional benefit to using a storeurl rewrite program: we could apply some ordering rule on query parameters, which could plausibly improve cache performance. (Perhaps we're doing this already -- I'm not too familiar with our setup.)

ori added a comment.Nov 14 2012, 12:02 AM

Created attachment 11356
Patch for Squid to add cookie headers to output sent to storeurl_rewrite_program

Attached:

(In reply to comment #53)

Squid 2.x has a 'storeurl_rewrite_program' directive, which specifies an
external program that Squid calls to rewrite / canonicalize URLs prior to
performing cache operations:

Squids are the front tier. They get hit about a hundred thousand times per second (perhaps a few hundred times each) on the wikimedia cluster. Are you sure this scales?

(In reply to comment #53)

I wrote a simple rewriter in Python that checks for the presence of a 'ULS'
cookie and adds it to the URL as an additional query parameter (also attached).

Does Squid consider that a variant, or a separate URL? If it's a separate URL, purging becomes a problem, because we then have to explicitly purge all 500 or so possible URL variations.

ori added a comment.Nov 14 2012, 10:35 AM

Ok, so this too would screw with purging. Sorry for being daft. Following a discussion about this with Daniel and Tim on IRC, it appears that the right way to fix this is:

  1. Disable the extension for now.
  2. Amend Tim's X-Vary-Options patch (http://paste.ubuntu.com/1357630/) to also operate on cookies.

AFAIK Patrick Reilly has some thoughts. I hope he can share them here.

I have filed two feature requests to ULS for two possible solutions:

  • disable ULS for anons: bug 42157
  • disable language detection (Eric's proposal): bug 42159

Perhaps this here report should be moved to the Wikimedia/wikidata component, because it's about a solution for wikidata.org that involves ULS and Squid configuration.

(In reply to comment #59)

I have filed two feature requests to ULS for two possible solutions:

  • disable ULS for anons: bug 42157

Implemented and deployed.

ori added a comment.Nov 16 2012, 6:11 PM

The canonical home for Tim's X-Vary-Options patch is:
https://gerrit.wikimedia.org/r/gitweb?p=operations/debs/squid.git;a=blob;f=debian/patches/26-vary_options.dpatch;hb=HEAD

When Tim posted his patch to the squid-dev mailing list in 2008, there seem to have been interest in merging it to Squid-2.HEAD. Adrian Chadd, one of Squid's maintainers, wrote:

I'm happy to commit this to Squid-2.HEAD as-is. Can you throw it in a
Bugzilla report and spit me the number?

http://www.squid-cache.org/mail-archive/squid-dev/200802/0282.html

The idea of extending this patch to handle cookie names and values was floated later in the thread. One way to move this current ticket forward would be to do exactly as Adrian suggests and file a Bugzilla bug for this patch on Squid's bug tracker, provide a link to (and a summary of) this discussion, and then e-mail squid-dev about it. Squid 2.7.9 is still the stable head of the 2.7 version and is widely used, so it is not implausible that someone with the requisite skill will step up.

(In reply to comment #61)

The canonical home for Tim's X-Vary-Options patch is:
https://gerrit.wikimedia.org/r/gitweb?p=operations/debs/squid.git;a=blob;
f=debian/patches/26-vary_options.dpatch;hb=HEAD

When Tim posted his patch to the squid-dev mailing list in 2008, there seem
to
have been interest in merging it to Squid-2.HEAD. Adrian Chadd, one of
Squid's
maintainers, wrote:

> I'm happy to commit this to Squid-2.HEAD as-is. Can you throw it in a
> Bugzilla report and spit me the number?
http://www.squid-cache.org/mail-archive/squid-dev/200802/0282.html

The idea of extending this patch to handle cookie names and values was
floated
later in the thread. One way to move this current ticket forward would be to
do
exactly as Adrian suggests and file a Bugzilla bug for this patch on Squid's
bug tracker, provide a link to (and a summary of) this discussion, and then
e-mail squid-dev about it.

Was this done?

Squid 2.7.9 is still the stable head of the 2.7
version and is widely used, so it is not implausible that someone with the
requisite skill will step up.

He7d3r added a comment.Jun 6 2013, 9:12 PM

(In reply to comment #62)

(In reply to comment #61)
Was this done?

Ping.

I don't think so. Nevertheless it is soon irrelevant as remaining squids are being migrated to varnish (as far as I know).

Denny added a comment.Aug 22 2013, 2:54 PM

Closed older resolved bugs as verified.

Nemo_bis changed the task status from Resolved to Declined.Mar 27 2017, 6:58 AM