Page MenuHomePhabricator

Create a hook like AfterFinalPageOutput with the cacheability of the page
Closed, ResolvedPublic

Description

Problem
There is no way in MediaWiki to apply actions dependent on whether a page is cacheable or not. The last hook that can be used before output is sent to the client is AfterFinalPageOutput, but as stated in the documentation, this hook is fired before cacheability is determined (i.e. before OutputPage::sendCacheControl()).

Example
In T152462 it was discovered that the cookie should be applied to all non-chacheable requests, however, the cachability of a request has not been determined, so there is no way to determine if a a request should get the cookie, or if adding the cookie is unsafe.

Solution
Add a hook to OutputPage:: sendCacheControl() and ApiMain:: sendCacheHeaders() (and anywhere else the Cache-Control header is set) that will allow the response to be modified with a result of the cacheability (i.e. will this request be cached in Varnish? on the Client? or Neither?)

Alternatively, T194393: Implement standard middleware interface in MediaWiki (PSR-15).

Event Timeline

Vvjjkkii renamed this task from Create a hook like AfterFinalPageOutput with the cacheability of the page to f7caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Ankry renamed this task from f7caaaaaaa to Create a hook like AfterFinalPageOutput with the cacheability of the page.Jul 2 2018, 11:00 AM
Ankry raised the priority of this task from High to Needs Triage.
Ankry updated the task description. (Show Details)
Ankry added a subscriber: Aklapper.

@dbarratt This was discussed in TechCom last Wednesday and we would prefer not to add a hook for this purpose. There were various thoughts about a solution that would fit more naturally in our architecture, but I'd like to better understand the context first before we make suggestions.

Speaking for myself, I think we should generally avoid "business logic" from depending on something like whether the current HTTP response is cacheable by Varnish. This is something that could and should be allowed to change based on needs and requirements from traffic and scale, not something that should impact the application logic.

Having said that, there definitely are cases where a feature can only work if we agree not to cache certain request's responses, e.g. for personalisation and other session-based interactions.

Getting back to the question at hand - there is some confusion about the use case. If you were able to work-around it, a link to that code would help.

it was discovered that the cookie should be applied to all non-cacheable requests.

This implies there exists logic that obtained information about the current user in a way that was insufficiently conditional (on there being a session), determined a cookie to want to set, and then wants to know whether doing that is safe.

We usually do things the other way around: The logic should not obtain such information or compute values to be set unless we determine that the current request is session-based (e.g. user agnostic, or user specific). If a session is available, information may be read and used, otherwise it must not.

This is applicable to more than just set-cookie operations, but also anything in the output HTML, etc.

Krinkle triaged this task as Medium priority.
Krinkle moved this task from Inbox to In progress on the TechCom board.

This implies there exists logic that obtained information about the current user in a way that was insufficiently conditional (on there being a session), determined a cookie to want to set, and then wants to know whether doing that is safe.

We usually do things the other way around: The logic should not obtain such information or compute values to be set unless we determine that the current request is session-based (e.g. user agnostic, or user specific). If a session is available, information may be read and used, otherwise it must not.

Do anonymous users have sessions? I was under the impression that only logged-in users have session and anonymous users do not. Is this incorrect?

Do anonymous users have sessions?

Yes, they can. Most write actions start a session, including when anonymous users make contributions. Among other things, this ensures they can receive talk-page notifications, and it ensures ChronologyProtector helps them immediately see their own contributions on e.g. History and Recent changes (just like logged-in users would).

We usually do things the other way around: The logic should not obtain such information [..] unless [..] the current request is session-based [..]

I wanted to add: The cache infrastructure uses standard indicators to decide whether a response can be cached. As such, you should communicate as part of a response that is must not be cached if that is the desired outcome - rather than trying to anticipate what others layers will do and only performing the desired action if it will not cache. The standard means we use for this is the HTTP Vary header, with a few optimisations for common cases in Varnish VCL.

@Krinkle oh. that's fascinating. Is there a standard way to determine if the user has a session?

Yes. I believe it is now under Session::isPersistent as part of SessionManager – used in OutputPage::sendCacheControl.