This is a rough sketch to seek input from other developers.
#### RationaleObjective
Enable MediaWiki to respond to page views in a fairly performant manner for all users (, both those with sessions (e.g. registered and logged-in or notn) and those without sessions (unregistered/anonymous).
#### OutcomesThis is motivated by the folllowing high-level goals:
* A site admin can enable static caching (eg.**Equitable performance**. Decrease the gap between page load time for registered and unregistered users. Unregistered users generally receive pages from our CDN. Registered users must interact with the application servers to render their pages, however this need not be an expensive and slow operation. As of July 2022, MediaWiki as deployed at WMF, takes ~200ms to render a desktop page view, and ~270ms to render a mobile page view (regardless of whether a CDN cache miss for unregistered, or registered).
** [Grafana: Backend Pagview Timing (desktop min: 48ms, desktop p75: 203ms, mobile min: 65ms, mobile p75: 279ms) ](https://grafana.wikimedia.org/d/QLtC93rMz/backend-pageview-timing)
* **Improve resiliancy against on-going attacks**. Rendering articles is the most common operation we perform for our CDN, and this fact is core to the Foundation mission. (The frequency of article requests is not a technical side-effect of secondary features or of how we implement things, such as may be the case with jobrunner or API calls; rather pageviews are themselves core to our mission). This operation should be among the cheapest and most optimimised, and yet has become one of the most expensive. Speeding up backend page view rendering means we won't be as susceptible to certain kinds of DDOS attacks. The cost of such request would be much lower, especially repeated CDN-cache misses for what are internally the same page+skin combination (regardless of exact URL).
** Compare to [Grafana: ResourceLoader (min: 20ms, median: 52ms, p75: 73ms)](https://grafana.wikimedia.org/d/000000066/resourceloader?orgId=1&viewPanel=45), [API requests: opensearch (median: 49ms)](https://grafana.wikimedia.org/d/000000559/api-requests-breakdown?orgId=1&var-metric=p50&var-module=opensearch), [Grafana: Application Servers RED](https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red).
* **Simplify developer experience**. Today we have a lot of flexibility in the skin, and looking at examples of existing code can give the wrong idea. The caches are hidden behind what appear to be (and often still are) expensive service calls. This means "follow the example" leads by default to more inline service calls that are late-discovered by the runtime, uncached, and unbatched, and subject to global state. Doing the "right" thing requires a steeper learning curve, and also tends to bottleneck with input from the Performance Team. By moving these to an upfront declared portion of the Skin class, the default place is naturally cached, and naturally permitted to utilize global state, wgUseFileCache) and the cache will be used for both logged-out and logged-in users by having the minimal run-time still apply the skin after FileCacheand thus naturally encourages the "right" thing. (It is currently applied before FileCache,Bypassing that layer would stand out in code review. and anon-only.)It also means we automatically cache the longer tail of smaller computations as well, for which today we would likely not bother creating an ad-hoc cache for manually.
* Routing logic will know which urls are page views and which not.**Future exploration**. There are numerous client-side and infastructure performance improvements dependent on a predictable URL router, (Currently special pages can disaas well as a wide range of possible OutputPage mid-way.)UX explorations. This includes:
* The Skin is given (cacheable) data about the skin** More effective CDN caching by normalizing URLs based on declared routes (T310087).
** Sending preload headers from the edge based on declared routes (e.g flush appshell or send preload skin headers on page views, currently not possible since we have pageview-like route bypasses like `Special:Export/:title`).
** PWA or offline web app for mobile, //without// the need for a fully developed native app. To explore those in a sustainable manner, I consider it a prerequisite that the skin (or more precisely, the skins we deploy at WMF) not depend on directly contacting internal databases or backend services during the rendering step, but rather are modelled such that this data is injected as part of a single upfront payload to the skin service class. Whether, why, and how to potentiallly expose that payload to an external re-implementation of the skin is for later consideration. Doing so today would inevitably lead to broken contracts, out-of-sync business logic, and a stuck MVP that can never serve beyond a mostly passive/read-only experience.
#### Outcomes
Today we generally compute skin data "at the last minute" and thus rely solely on Varnish to essentially "memcache" this skin data even though it doesn't vary between users. This has two negative side-effects: 1) We're only achieving our perfomance goals today so long as we keep a very high Varnish CDN cache-hit ratio. and 2) All registered/logged-in traffic, as well as any cache miss from unregistered users, the page anddoes not meet our goal.
By letting MW cache the user.skin data explicitly, It will produce its HTML output without performing addiwe can let Varnish be focussed on and optimised for serving the scale that is Wikipedia productional queries. traffic; Therefore,instead of also having to implicitly be responsible for meeting individual page view performance goals. any additional DB queries skins currently make for page-related data must instead happen in a way that is triggered earlier when the page is being cached (e.g.(And systemically excluding registered users from that goal.)
Technical outcomes:
* Routing logic will be aware of which urls are page views, skin hooks for Parser,and which are not. ParserCache or FileCache?(Currently a small number of legacy special pages bypass the route by retroactively disabling OutputPage mid-way through the response.)
* The Skin is given (cacheable) data about the page and user. It will produce its HTML output without performing additional backend queries. Therefore, any DB calls skins currently make for page- or user-related data must be moved to pre-cache accessors before the "render" and "getTemplateData" step.
#### Small but impactful changes
1. MediaWikiAdd a router that will have a router that reliably know how to construct an output handler (econtrol which output handler (Action or SpecialPage class) is invoked, e.g. no surprise `OutputPage::disable()` half-way through the request, such as in SpecialExport and RawAction currently do. This would be similar to what `MediaWik.php` does already, no unpredictable `OutputPage::disable()` half-way through the requestbut based on a declarative registry instead of being procedural, such as in SpecialExport,that the result of the registrations can be expressed in data and e.g. RawAction and elsewhere)in the future also be exported as a JSON payload containing URL patterns.
2. FileCache no longer caches entire HTML responseSkin templates for the default mobile/desktop skin at WMF must be reduced to only string concatenation and simple boolean/iteration logic based on received data. Instead it just stores the core page content and relevant data properties needed by the skin (eE.g. display title, last modified date, categories, permissions, etc.).
3. Skin Template must be programming-language neutral (e.g.as simple as a Mustache template, and preferably as actual Mustache templatesuch that this restriction is naturally promoted ("easy to do the right thing").
43. Data flows only in one direction onlyto the Skin. No more two-way communication betweenfrom Skin andback to OutputPagee and other service classes. OutputPage injects all relevant data (maybe some of it lazy-computed) into the Skin class. Skin classes may combine it with its receied User info and transform properties (rename, Skin class may combine it with User info and transform properties (renamingreduce, derivinge, etc.), and then invokes the Skin template with it.
* Skin hooks must change from being "that WMF production extensions rely are audited. Any hooks that are currently essentially "unrestricted callables with global context returning uncacheable raw html" to instead bshould be replaced with new hooks that are "callables that returnrn deterministic and cacheable HTML or template partials"data". They can still vary by user and page (e.g. "What links here", "My contributions"), but the decision must be made in Mustache syntax using only the available user/page information. This way it can be rendered in a different server process, or even client-side, and still work and have all the information (T106099). It also means that on page views, all available meta data is naturally already fetched and cacheablebut any user-variable decisions about page-specific data must be made in the skin template by combining the two datasets rather than hooks doing this already in a way that makes the returned data uncachable as either "by user" only "or by skin/page" only. If extension hooks require additional information, they can do so, but must use hook behind the Skin or ParserOutput cache, not in front of itafter it. E.g. to do their query when the page cache blob is generated.they could compute the information via a hook during page parsing, Not on-demand for every page viewor when generating the userinfo blob or skin metadata blob.
Other thoughts:
* As things unfold### Other thoughts
It is likely that the outcome of this will be that third-party site admins can effectively achieve performance close to what `$wgUseFileCache` offers today, but for both logged-ou and logged-in users alike. The minimal "post-cache" run-time that we invoke after reading FileCache, would now apply the Skin logic at runtime instead of before cache time. (Today, by running it before the cache, the result is unusable for logged-in users). Potentially it might also obsolete the need for FileCache, it might make sense to obsolete FileCache in favour of a better ParserOutput cache (if it has all the needed data and gains support for storing on-disk instead of in SQL).or possibly keeping it around only to help increase server capacity for logged-out users (as opposed to for page loading speed).
The role of FileCache would then essentially be obsoleted by a slightly larger ParserOutput cache (ParserCache), Given that the run-time overhead of taking a ParserOutput object and applying the Skin template would be cheap enough that it would be negligible.additional memcached keys, Alternatively,or the MainStash. we may change FileCache to be more like a companion to ParserCache that caches only the extra data needed by skins.Alternatively, – Or store them in regular ObjectCache/mainCache insteadwe may change FileCache to be more like a companion to ParserCache that caches only the extra data needed by skins, or store the supplemental data in MainStash or memcached..
**Steps**:
* [ ] Deprecate `OutputPage::disable()`.
* [ ] Provide a routerImplement a router that supports both paths and query parameters. See also `MediaWiki::parseTitle()` and other methods in `MediaWiki.php`. The decision to not use OutputPage for a response must be made there, before any Action or SpecialPage is chosen. This will map paths and query parameters to a handler,E.g. similar to `MediaWiki::parseTitle()` and other methods in MediaWiki.php do now.it should be known without executing any output handlers whether `/wiki/Foo?action=edit&search=bar` is a page view, The decision to not use a skin or OutputPage for a response must be made here.an edit action, The router registry should be in a static programming-language neutral format that can be exposed over the API for potential use by a different server process.or a search query.
* [ ] Make data flow in one direction from OutputPage to Skin.
* [ ] Convert Skin hooks.
* [ x] Convert one Skin (e.g. Vector) to become a template, as example. (Note, there is no pressure to remove support for PHP-based skins, the only requirement is that it can handle responses in front of the page cache instead of behind it, e.g. using data properties instead of run-time database queries and WikiPage methods calls).
### What this task is not
This task does not change the architecture of MediaWiki or the skin system. It also does not change how MediaWiki is used or deployed at WMF.
This work covered by this task resolves long-standing technical debt in MediaWiki for the purposes of improved rendering performance and reduced overall complexity and maintenance cost for the skin system – whilst remaining fully backwards compatible.
The intended result is improved performance of rendering page views in MediaWiki core, both out-of-the-box and at WMF.
The works covered by this task should in my opinion be considered a prerequisite for the following proposals. Some of the below might be possible without this task, although I believe doing so would incur significant amounts of maintenance overhead and technical debt that can and should be avoided (through this task).
* Add a way for MediaWiki to render wiki page, actions, and special pages without a skin. – T114596
* Re-implement MediaWiki skin rendering for WMF-deployed skins in a micro-service that runs near the CDN. – T111588
* Re-implement MediaWiki skin rendering for WMF-deployed skins in a way that can be run offline and client-side using service workes. – T106099.