Page MenuHomePhabricator

Performance review of Wikifunctions multingual setup and caching
Closed, ResolvedPublic

Description

Description

Wikifunctions will have its content available in many different language, similar to Wikidata. Unlike Wikidata, we will have unique URLs for each language. E.g. www.wikifunctions.org/wiki/fr/Z6 would give the page about the object Z6 in French, www.wikifunctions.org/wiki/ar/Z6 in Arab, etc.

We need to ensure that this works reasonably well with Caching. The pages are client-side-rendered, and are going to be (relatively) few in number.

More details are described here: T268678

Preview environment

We want to check with which implementation of our requirements to proceed before doing it. Therefore the URL scheme is not yet implemented and set up. Individual pages in several languages though are already available.

E.g. this is the implementation of add s to end in English and in German:

Which code to review

The code for the Wikilambda extension which will run Wikifunctions is available here:
https://gerrit.wikimedia.org/g/mediawiki/extensions/WikiLambda

Particularly interesting is the current WIP:
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/925927/ – the uselang flag is set on incoming requests using the PathRouter class built into MediaWiki, with some link re-writing as needed.

Performance assessment

Please initiate the performance assessment by answering the below:

  • What work has been done to ensure the best possible performance of the feature?

Used MediaWiki's i18n framework as far as possible.

  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?

How much can be cached, how much do we split the cache?

  • Are there potential optimisations that haven't been performed yet?

We could potentially save content fragments and update them asynchronously. This conversation is being lead by James Forrester.

  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.

One assumption we have is that Wikifunctions will trail our popular projects in popularity for quite a while (i.e. there won't be that many people coming to Wikifunctions at first). This will allow us to observe how Wikifunctions acts under realistic loads, and we will be able to implement changes that make us more robust with regards to growth when necessary. It would be good to know if we need to have any specific metrics in place in order to collect the right data to support future scaling.

We are asking for advice on how to implement the i18n mechanism in a way that doesn't break caching.

Event Timeline

Hey Denny, thanks for following up on our previous conversation. Could you help us understand this further?

You mentioned pages are client-side-rendered, can you share with me the reasoning behind this approach? Is this a given and already decided?
Do you have a specific implementation in mind that you are worried about?
What specific scenarios are you thinking could break caching?
Is there any particular cache layer you need more guidance on?

Jdforrester-WMF subscribed.

Hey Denny, thanks for following up on our previous conversation. Could you help us understand this further?

You mentioned pages are client-side-rendered, can you share with me the reasoning behind this approach? Is this a given and already decided?

The entire interface has been built in Vue.js as the content is highly dynamic, and the complexity of building parallel rendering systems for also rendering the components in server-side PHP is far beyond the resources available to the team for now. This was decided in June 2020 and reviewed by the team in T261341 .

Do you have a specific implementation in mind that you are worried about?

Yes, as linked we have a simple implementation for the MediaWiki (PHP) side of this, using the existing support inside MediaWiki.

The principal concern we have is the "unknown unknowns" of what impact our change might have on the Wikimedia production systems, and especially if we'd provide too much burden on things like ATS/Varnish.

What specific scenarios are you thinking could break caching?

  • Are there things that magically require the current article paths and will break with this change?
  • Can we specify this for all our URLs, or should we leave the /wiki/en/Z123 as at the old 'canonical' URL of /wiki/Z123 for things that will break if it's a redirect? (We'd really rather avoid language primacy, given the point of the project.)
  • Some URLs (e.g. links to action=history) aren't captured; is there a neat way in which they could be?
  • If we don't actively purge ATS/Varnish, presumably(?) most of the concerns go away; is this a reasonable strategy, given the only server-generated content right now is the <h1> which we update from JS anyway? The FOUC will be a pain, but ah well.

Is there any particular cache layer you need more guidance on?

Mostly we're concerned about the ATS/Varnish cache, but there are probably others we've not considered.

I'm not able to perform an in-depth review until later this year, but I promised to take a look at the general shape and direction, timeboxed to 1 hour, with the intent to point to areas of concern. I hope the below contains sufficient ideas and pointers to empower you to pave your own way for a little while.

If more information or elaboration is needed, we'll likely need to take it in through the proper process and deal with it in a later quarter.

Localised URLs

Wikifunctions will have its content available in many different language, similar to Wikidata. Unlike Wikidata, we will have unique URLs for each language. E.g. www.wikifunctions.org/wiki/fr/Z6 would give the page about the object Z6 in French, www.wikifunctions.org/wiki/ar/Z6 in Arab, etc.

[…] We want to check with which implementation of our requirements to proceed before doing it. […] E.g. […] in English and in German:

[…]

  • Some URLs (e.g. links to action=history) aren't captured; is there a neat way in which they could be?

The suggested URL scheme of /wiki/fr/Z6 isn't supported in MediaWiki today. Examples of URL patterns that MediaWiki can route today:

Placing the language code within the routing segmented designated for formatted titles of editable/canonical wiki pages would, I believe, expose Wikifunctions to an endless long tail of bugs and incompatibilities throughout the platform. A few examples on the backend:

  • Can these be valid link targets from the Parser perspective (e.g. on a talk page or user page),
  • how would these end up in the pagelinks/linktarget tables,
  • what will the Action API do when accepting these as the title subject for an API query or other API action,
  • what is the key for PingLimiter, PoolCounter etc given that these different titles are not distinct pages.

These and more would be valid expectations from the perspective of core developers, extension developers, bot developers, operating on MediaWiki sites. And indeed would bring inumerable caching problems also for ParserCache, but also at the lower level where individual hooks in the Skin or any extension may operate and vary or associate something it does with a page title on the assumption that a given page (Z6) when viewed behaves a certain way.

Another question that would arise from my perspective is whether these languages are meant to be limited to the content area or not. The example mentions uselang=de as equivalent, however this modifies both the interface language and the localised content area, whereas typically in MediaWiki when routes other than "uselang" are used, the language is (and is expected) to be under the control of the user preferences. And thus when I view Z6/fr, would I be able to access the French localised content such that the skin interface remains in my own chosen language?

Toward the frontend there is indeed action=history as potential concern where "reality" would leak through the abstraction Other examples might be: Talk page associations (is Talk:Z6/fr a separate wikitext page from Talk:Z6?), page deletion, page protection, and e.g. community gadgets and Parsoid/VisualEditor operating on URLs with certain expectations of what the wgArticlePath route means and does.

In a nut shell, I think MediaWiki works best when the article path ("pretty" URLs for page views) are correlated only with a valid title, because existing URL/Title parsers in the MediaWiki ecosystem will recognise these as such, and operate on them with certain expectations.

But, with a very minor change to your proposed URL pattern, this can all be categorically avoided. For example, the localised view mode could be served from a Special page like Special:ViewEntity/:id/:lang and canonically served (without redirect!) from a nice URL like wikifunctions.org/view/Z6/fr. Would that satisfy your needs? I would suggest taking this approach as a start, and then identifying specific product use cases where something isn't right. For example, you can use a core hook to change the canonical URL <link> for Z-pages to be /view/Z6/en so that search engines and other crawlers mainly promote those links, or even redirect /wiki/Z6 entirely the /view/ URL form when the request is a page view (for everyone, or possibly logged-out requests only).

This would also give you full control over the /view/* prefix in robots.txt, separate from the non-content part of the wiki without much risk or complexity.

Caching

By limiting the localised nature explicitly to a special page and associated URL pattern, we avoid doubt and non-confidence into leaking all other parts of the wikifunctions.org domain and the MediaWiki platform it runs on.

When issueing redirects, the main thing to look out for is that all aspects that influence the redirect, are specified in the Vary response header and to permit at least some non-zero amount of public CDN caching, in order for our traffic layer to remain effective in absorbing non-malicious traffic. Whether the site is expecting large amounts of traffic at this point is not a concern for me, but unless it runs from a dedicated appserver cluster, I expect the risk to be too high for SRE as the domain would become an easy DDoS vector even through non-malicious crawling. As such, I would suggest allowing most of MediaWiki to run effectively and non-variantly, and exposing the custom localised funcitonality through a limited URL path. Even if that path becomes the default way for the site to be seen and interacted with, it'll logically separate it in the backend, and naturally avoid cache conflicts and misunderstandings. This because it would not be subject to hooks and cache constraints that are placed by MediaWiki core's production configuration for article views. Instead, it gets to define its own contracts and constrains from its SpecialPage-based pageview backend.

That SpecialPage should probably render fairly quick and have non-zero caching oppertunity, but as you say, that's something that can be improved as you go, with a fairly low risk during the initial launch.

Summary

In summary, I think the biggest risk factor for Wikifunctions is to innovate in the critical path of areas that are not essential to its purpose, and to break existing contracts in ways that are not currently supported by the platform. This inherently creates more unknowns and risks than even I can reasonably identify and help support; especially in areas of caching, latency, and security.

Broadly speaking my recommendations are:

  • fully satisfy existing core contracts and expectations, especially around what can vary when.
  • isolate new innovation and features that have different expectations, from new endpoints, through a primitive in MediaWiki that supports the same or broader level of flexibility.
  • override existing handlers as narrowly and specifically as possible.

In the case of localised page views, SpecialPages are the lowest common denominator that effectively make zero guruantees or promises, allowing you to define the exact rules you need. And you can utilize narrow hooks for canonical URLs or page view redirects to expose this new functionality as-needed.

I'm not able to perform an in-depth review until later this year, but I promised to take a look at the general shape and direction, timeboxed to 1 hour, with the intent to point to areas of concern. I hope the below contains sufficient ideas and pointers to empower you to pave your own way for a little while.

If more information or elaboration is needed, we'll likely need to take it in through the proper process and deal with it in a later quarter.

Thanks!

Localised URLs

Wikifunctions will have its content available in many different language, similar to Wikidata. Unlike Wikidata, we will have unique URLs for each language. E.g. www.wikifunctions.org/wiki/fr/Z6 would give the page about the object Z6 in French, www.wikifunctions.org/wiki/ar/Z6 in Arab, etc.

[…] We want to check with which implementation of our requirements to proceed before doing it. […] E.g. […] in English and in German:

[…]

  • Some URLs (e.g. links to action=history) aren't captured; is there a neat way in which they could be?

The suggested URL scheme of /wiki/fr/Z6 isn't supported in MediaWiki today. Examples of URL patterns that MediaWiki can route today:

Placing the language code within the routing segmented designated for formatted titles of editable/canonical wiki pages would, I believe, expose Wikifunctions to an endless long tail of bugs and incompatibilities throughout the platform. A few examples on the backend:

  • Can these be valid link targets from the Parser perspective (e.g. on a talk page or user page),
  • how would these end up in the pagelinks/linktarget tables,
  • what will the Action API do when accepting these as the title subject for an API query or other API action,
  • what is the key for PingLimiter, PoolCounter etc given that these different titles are not distinct pages.

These and more would be valid expectations from the perspective of core developers, extension developers, bot developers, operating on MediaWiki sites. And indeed would bring inumerable caching problems also for ParserCache, but also at the lower level where individual hooks in the Skin or any extension may operate and vary or associate something it does with a page title on the assumption that a given page (Z6) when viewed behaves a certain way.

https://wikifunctions.beta.wmflabs.org/wiki/de/Z10212 does indeed have these issues, hence this task, yes. :-)

Another question that would arise from my perspective is whether these languages are meant to be limited to the content area or not. The example mentions uselang=de as equivalent, however this modifies both the interface language and the localised content area, whereas typically in MediaWiki when routes other than "uselang" are used, the language is (and is expected) to be under the control of the user preferences. And thus when I view Z6/fr, would I be able to access the French localised content such that the skin interface remains in my own chosen language?

We're intentionally collapsing content and interface languages; given that our concept of a "language" is a superset of MW's, it's not plausible for us to support both at launch, but maybe in a year or so we might want to distinguish these for logged-out users.

Toward the frontend there is indeed action=history as potential concern where "reality" would leak through the abstraction Other examples might be: Talk page associations (is Talk:Z6/fr a separate wikitext page from Talk:Z6?), page deletion, page protection, and e.g. community gadgets and Parsoid/VisualEditor operating on URLs with certain expectations of what the wgArticlePath route means and does.

We really don't care about existing gadgets/editors, as they're irrelevant to our content type. And no, as you can trivially see from our code, this adjustment is exclusively for NS0 in our content type. So this shouldn't break anything around related content (which all get the 'correct' links and behaviour as expected). Local gadgets will be written around the users' expectation of how pages work, much like how Wikidata NS0 pages are quite unlike wikitext pages.

In a nut shell, I think MediaWiki works best when the article path ("pretty" URLs for page views) are correlated only with a valid title, because existing URL/Title parsers in the MediaWiki ecosystem will recognise these as such, and operate on them with certain expectations.

Ack.

But, with a very minor change to your proposed URL pattern, this can all be categorically avoided. For example, the localised view mode could be served from a Special page like Special:ViewEntity/:id/:lang and canonically served (without redirect!) from a nice URL like wikifunctions.org/view/Z6/fr. Would that satisfy your needs? I would suggest taking this approach as a start, and then identifying specific product use cases where something isn't right. For example, you can use a core hook to change the canonical URL <link> for Z-pages to be /view/Z6/en so that search engines and other crawlers mainly promote those links, or even redirect /wiki/Z6 entirely the /view/ URL form when the request is a page view (for everyone, or possibly logged-out requests only).

This would also give you full control over the /view/* prefix in robots.txt, separate from the non-content part of the wiki without much risk or complexity.

But in return we'd need to re-implement everything in Vector/Vector-2022/Minerva/Timeless/etc. around adding edit/history/etc. buttons to these fake content pages? I don't think this is a plausible way forward, unfortunately. Was this part of your consideration?

[Also this feels an awful lot like wgActionPaths, but I suppose you can't keep a good idea down.]

Caching

By limiting the localised nature explicitly to a special page and associated URL pattern, we avoid doubt and non-confidence into leaking all other parts of the wikifunctions.org domain and the MediaWiki platform it runs on.

When issueing redirects, the main thing to look out for is that all aspects that influence the redirect, are specified in the Vary response header and to permit at least some non-zero amount of public CDN caching, in order for our traffic layer to remain effective in absorbing non-malicious traffic. Whether the site is expecting large amounts of traffic at this point is not a concern for me, but unless it runs from a dedicated appserver cluster, I expect the risk to be too high for SRE as the domain would become an easy DDoS vector even through non-malicious crawling. As such, I would suggest allowing most of MediaWiki to run effectively and non-variantly, and exposing the custom localised funcitonality through a limited URL path. Even if that path becomes the default way for the site to be seen and interacted with, it'll logically separate it in the backend, and naturally avoid cache conflicts and misunderstandings. This because it would not be subject to hooks and cache constraints that are placed by MediaWiki core's production configuration for article views. Instead, it gets to define its own contracts and constrains from its SpecialPage-based pageview backend.

That SpecialPage should probably render fairly quick and have non-zero caching oppertunity, but as you say, that's something that can be improved as you go, with a fairly low risk during the initial launch.

Yes, perhaps.

Summary

In summary, I think the biggest risk factor for Wikifunctions is to innovate in the critical path of areas that are not essential to its purpose, and to break existing contracts in ways that are not currently supported by the platform. This inherently creates more unknowns and risks than even I can reasonably identify and help support; especially in areas of caching, latency, and security.

Broadly speaking my recommendations are:

  • fully satisfy existing core contracts and expectations, especially around what can vary when.
  • isolate new innovation and features that have different expectations, from new endpoints, through a primitive in MediaWiki that supports the same or broader level of flexibility.
  • override existing handlers as narrowly and specifically as possible.

In the case of localised page views, SpecialPages are the lowest common denominator that effectively make zero guruantees or promises, allowing you to define the exact rules you need. And you can utilize narrow hooks for canonical URLs or page view redirects to expose this new functionality as-needed.

I'll have a play around to see if this way forward can work, but I fear it won't.

Change 932025 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/WikiLambda@master] [WIP] Implement Special:ViewZObject to provide a Special page version of NS0

https://gerrit.wikimedia.org/r/932025

Change 940244 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/core@master] docker: Update apache2 image to one with a /view rewrite rule

https://gerrit.wikimedia.org/r/940244

Change 940245 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/puppet@production] apache: Add 'view_urls' rewrite for /view URLs, enable on Beta Cluster

https://gerrit.wikimedia.org/r/940245

Change 940246 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/puppet@production] apache: Enable view_urls on wikifunctions.org

https://gerrit.wikimedia.org/r/940246

Change 940244 merged by jenkins-bot:

[mediawiki/core@master] docker: Update apache2 image to one with a /view rewrite rule

https://gerrit.wikimedia.org/r/940244

Change 932025 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Implement Special:ViewObject to provide a Special page version of NS0

https://gerrit.wikimedia.org/r/932025

Change 940245 merged by Alexandros Kosiaris:

[operations/puppet@production] apache: Add 'view_urls' rewrite for /view URLs, enable on Beta Cluster

https://gerrit.wikimedia.org/r/940245

Change 940246 merged by Alexandros Kosiaris:

[operations/puppet@production] apache: Enable view_urls on wikifunctions.org

https://gerrit.wikimedia.org/r/940246

Change 941979 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/puppet@production] apache: Actually enable view_urls on wikifunctions.org

https://gerrit.wikimedia.org/r/941979

Change 941979 merged by Alexandros Kosiaris:

[operations/puppet@production] apache: Actually enable view_urls on wikifunctions.org

https://gerrit.wikimedia.org/r/941979

Krinkle subscribed.
DVrandecic claimed this task.