Copied over from enwp Village Pump (Technical)
In [[ https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(miscellaneous)#Very_big_pages | Wikipedia:Village pump (miscellaneous)#Very_big_pages ]] someone has arisen the problem of long delays (i.e. till 5..7 seconds) when visiting big wiki pages, specially the first time someone retrieves one of them after sometime (days or weeks?) nobody has visited it.
In practice the topic being discussed was about **how to lower the size of big / long wiki pages** because someone thought that most of the wait time was due to client browser //struggling// to retrieve and visually render such big pages (HTML code between 1MB and 1.6MB) and having a PC with CPUs not slow and enough RAM, he was worried about what could happen to smartphones and other mobile devices with much less HW resources.
Someone else replied that he did not notice such a long delays, at most 1..2 seconds, so I contributed to the discussion with my thoughts too and I found out that maybe something in **wiki pages retrieval mechanism** between web servers and browsers could be improved.
Here I do not want to discuss about the problem of long delays for the first time retrieval of big pages because it's a too technical thing that regards only wikipedia internal technical stuff; I have already written something about it in the above mentioned wiki section page.
Talking about the impact of slow response in user (web) interfaces, see also wiki article about **[[https://en.wikipedia.org/wiki/Responsiveness | responsiveness]]**, I have noticed that **web caching of wiki pages** might be **improved noticeably** in order to //decrease a lot of unnecessary web traffic load// that probably burdens web applications and DB (database) too.
**Caching assumptions**
If a user is **logged in**, it looks like that **wiki pages are never cached** by browser (this is right, at least when editing a page), instead if user is **logged out** (not logged) pages are temporarily **cached** (this makes a noticeable difference) and so if they have already been retrieved (recently, within 1 minute) and they have not been modified then they are rendered in less than 0.2..0.3 seconds on a medium speed (high speed, single CPU) PC.
Right know cacheability of a wiki page depends on:
* type of content encoding format (compressed or not);
* value of cookies;
* authorization;
because of "Vary" header, which is right.
Cookies contain also a session identifier that is dismissed when client browser is closed and so caching of wiki pages will always be session bound (it will last only till browser is kept open and only for users ''not logged in'').
***Current behavior / problems detected***
Current **caching of wiki pages** by browsers is **far from being perfect** because of the following technical issues:
* wikipedia **web servers** look like to be of different kinds, some send HTTP responses with a "Last-Modified" (but without "ETag") header, others with an "ETag" (but without "Last-Modified") header for the same wiki pages and so, depending on web traffic / load, etc. a browser can get responses one time from a web server and the other time from another one with a **different type of object / cache identifier** and of course in this case the entire wiki page has to be resent to browser even if it has not been changed, increasing the duration of overall operations by 10-20 times;
* **caching** of wiki pages depends also on **cookies** and cookies sent by browser include a counter value (tick) that changes at least every minute, so after 1 .. 60 seconds having received a wiki page, its cookie changes and browser has to invalidate its cached copy; this means that if the same wiki page is requested again by browser after its cookie has changed then the entire wiki page has to be retrieved, instead of just asking the web server if it has changed or not; this may be an issue that increases a lot web traffic and load of web servers, web applications, DB, etc. behind them.
In web server responses, it looks like that **"Cache-control"** and **"Expires" headers already have proper settings**, so apart from the alternate / random usage of "Last-Modified" and "ETag" headers, HTTP cache settings are already in good shape.
**Goal to improve website response times of wiki pages**
The aim should be to force browsers to ask web servers if a wiki page has been changed or not every time it is going to be displayed by browser (so statistics about number of pages showed by users should not be affected by this improvement); of course just asking that is much faster than retrieving the whole page every time it is visited by a user, specially if wiki page is big (there are wiki pages that do not change for hours, days or even months).
**Proposals to reach goal of good cacheability of wiki pages**
These are a few proposals in order to improve cacheability of wiki pages.
//Web / application server(s) side//
1. Each wiki page should have a **unique identifier** for each one of its revisions made by editors:
* surely this identifier already exists but all wikipedia web servers should send it as an "ETag" value for each wiki page;
* I suspect that using only a Last-Modified value would suffice because it is hard to think that a wiki page can be changed more than one time per second (although two users might update different sections of a page within one second).
2. Web servers should send both "Last-Modified" and "ETag" headers for each wiki page, as recommended by last RFCs about HTTP; if both are sent, then only "ETag" is used by modern browsers but at least they can show value of "Last-Modified" somewhere in the information window and besides this, should an old browser - not supporting "ETag" header - be used then it could work anyway by using "Last-Modified" header (which in 99.99% of cases would suffice).
//Client browser side//
3. Value of **Cookies should not change every minute**. The problem is how to obtain this goal without loosing that session tick value in case it is considered mandatory to handle user session.
* first idea would be to remove that value from Cookies and to store it in a new custom HTTP header, sent by client browser, dedicated to cookie values that change frequently and that do not affect the cacheability of a wiki page, i.e.:
`xapp-cookie: *TickCount=3`
: in this case above HTTP header should be kept in client browser request till its arrival to a web server, then its value should be readded to "Cookie" values by adding custom code in web server(s) (a simple modification) and then Cookies could be passed to a web cache or a web application as usual;
* second idea would be **to not use that tick value in Cookies for some kind of users**; if in near future, users who are **not logged in** won't be able to edit wiki pages, then also using a tick value for user's session might not be a requirement anymore and so it could be removed from Cookies, thus avoiding current cache problem with Cookies changing every minute; instead for users who were **logged in** nothing would change because for those users caching of wiki pages is already disabled;
* other idea(s) to be specified.
**Conclusions**
Doing above 3 suggested modifications, could allow caching of wiki pages feasable for an entire user session (until browser is closed), thus decreasing a lot the size of HTTP responses for wiki pages and the load server side.
Something about **improving HTTP caching** could be done also for **scripts and stylesheets**; they are not always cached by browsers as they should and besides this fact, sometimes their retrieval is a bit slow; specially when logged in, you can see that wiki page is first visually rendered with standard browser fonts and then, after 0.3 .. 0.6 seconds, with proper text fonts because download of some style sheet is completed after wiki page has started to be shown in browser.