New Service Request: Wikidata Termbox SSR
Open, Stalled, NormalPublic

Description

Description: https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
Timeline: 2019-01-31 would be fantastic. 2019-02-28 would be great. We will start to get worried if we don't have the service by end of March 2019.
Diagram:


Technologies: nodejs
Point person: @WMDE-leszek, @Addshore as a backup (especially deployment topics)

Source code: https://github.com/wmde/wikibase-termbox, move to gerrit to happen latest in early Jan 2019.

Load Details

The initial responsibility of this service will be the rendering of the term box for wikidata items and properties for mobile web views.
Currently wikidata.org gets no more that 80k mobile web requests per day (including cached pages, and non item/property pages).
If we were to assume all of these requests were actually to item and property pages that were not cached this would result in this SSR service being hit 55 times per minute.
(In reality some of these page views are not to item or property pages, and some will be cached) so we are looking at no more than 1 call per second.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 18 2018, 9:28 AM
mobrovac moved this task from Inbox to Radar on the TechCom board.
mobrovac added a project: serviceops.
Joe added a subscriber: Joe.Dec 18 2018, 11:11 AM

Looking at the attached diagrams, it seems that the flow of a request is as follows:

  • page gets requested to MediaWiki
  • MW sends a request to the rendering service
  • the rendering service sends request(s) to mediawiki via api.php to fetch the data, and sends back the rendered termbox to MediaWiki

wouldn't it be faster, more efficient and less error-prone to modify the workflow as follows:

  • page gets requested to MediaWiki
  • MW collects all the data needed for rendering the termbox, and sends a request to the rendering service
  • The rendering service sends back the HTML to MediaWiki

in this scenario, the termbox SSR becomes a pure lambda-like service that just transforms structured data to HTML.

Addshore moved this task from Unsorted 💣 to Watching 👀 on the User-Addshore board.
Joe added a comment.Dec 18 2018, 11:15 AM

Also: it is stated in https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service that "In case of no configured server-side rendering service or a malfunctioning of it, the client-side code will act as a fallback". This is a bit the other way around with respect to what is usually done with workers, but I see two advantages from our point of view:

  1. Gentler degradation of the functionality for non-js users
  2. Better page load time as we would cache the termbox on our edge systems

so I am overall ok with the idea. We should still take care to instruct the edge caches not to cache any page that lacks server-side rendering.

As a further optimisation of both the architecture as well as parsing and load times, MW/Wikibase could populate a (hidden?) tag in the DOM with all the info needed to generate the termbox. Then, if JS is enabled (w/ possibly ServiceWorkers), the client simply generates it. If a no-JS client gets the page, this could be done using an iFrame by requesting the specific termbox from the server via a cacheable URI. As a result, the service would need to do less work, the client would load the page faster and the architecture would be simpler.

That said, I'm not a UI/front-end expert, so I defer to others whether using iframes is doable or desirable.

daniel added a subscriber: daniel.Dec 18 2018, 2:52 PM

@mobrovac Please note that the term box is shown based on user preferences (languages spoken), the initially served DOM however needs to be the same for all users, so it can be cached. Also note that the language specific data that goes into the term box has to be loaded from the wikibase entity. So the only way to make it work as you suggested would be to always send all the terms in all languages, which, for some items, would be quite a bit of data.

@mobrovac Please note that the term box is shown based on user preferences (languages spoken), the initially served DOM however needs to be the same for all users, so it can be cached. Also note that the language specific data that goes into the term box has to be loaded from the wikibase entity. So the only way to make it work as you suggested would be to always send all the terms in all languages, which, for some items, would be quite a bit of data.

Oh right, the languages. for caching we can vary the accept-language header, so that at least the most requested languages can be served from cache. But you are correct, shipping all the data may become unwieldy. I would still prefer the client making the extra request, though, since it still keeps the architecture saner and faster than doing it prematurely internally.

Joe added a comment.Dec 18 2018, 8:11 PM

@mobrovac Please note that the term box is shown based on user preferences (languages spoken), the initially served DOM however needs to be the same for all users, so it can be cached. Also note that the language specific data that goes into the term box has to be loaded from the wikibase entity. So the only way to make it work as you suggested would be to always send all the terms in all languages, which, for some items, would be quite a bit of data.

Oh right, the languages. for caching we can vary the accept-language header, so that at least the most requested languages can be served from cache. But you are correct, shipping all the data may become unwieldy. I would still prefer the client making the extra request, though, since it still keeps the architecture saner and faster than doing it prematurely internally.

Varying the cache on accept-language is doable, but I don't think we do it right now and I don't think this applies though in this case:

  • if the user is logged in, their request is not cached, and we base the language shown on their preferences
  • if the user is not logged in and wants to choose a language for the interface different from the default, they have to add &uselang=... to their request, thus making it a different url to cache.

This is at least the behaviour I expect, and if that holds true we don't need to send data about all languages for every request either.

Using accept-language is not an option, at least not the accept-language from the browser. The relevant list of languages comes from user preferences and the Babel extension. This also means that it typically contains more than one language, and there are many, many combinations to consider.

Also: we have no purging mechanism for content that varies on language. We'd need xkey support first.

We have been exploring all this when we first implemented term box. There is no good way to cache it, really.

But keep in mind that wikidata doesn't have many readers, data gets used indirectly. So we don't have to optimize for the reading case as much as we would for Wikipedia.

I agree with Joe that it would be better to have the service be internal, and be called from MW. It doesn't have to be that way, but it's preferable because:

  • we would not expose a new endpoint
  • we should in general avoid (more) services calling MediaWiki, because:
    • PHP has high startup time, and also for reasons of general hygiene of the architecture
    • we don't want MW and external services calling each other, back and forth
  • "pure functional" services that do not interact with storage are easier to reason about, and easier to run and maintain.

However, these are general considerations. I see nothing that would totally block the architecture as proposed, if there are good reasons for doing it this way.

Joe added a comment.Dec 19 2018, 8:55 AM

I agree with Joe that it would be better to have the service be internal, and be called from MW. It doesn't have to be that way, but it's preferable because:

  • we would not expose a new endpoint
  • we should in general avoid (more) services calling MediaWiki, because:
    • PHP has high startup time, and also for reasons of general hygiene of the architecture
    • we don't want MW and external services calling each other, back and forth
  • "pure functional" services that do not interact with storage are easier to reason about, and easier to run and maintain.

    However, these are general considerations. I see nothing that would totally block the architecture as proposed, if there are good reasons for doing it this way.

Well, I consider calling the MW api from a service called by MediaWiki an antipattern that we should absolutely avoid.

So either of these options is viable IMHO:

  1. MediaWiki can send to the service all the data it needs to generate the termbox, without further calls to the api
  2. The rendering is done in the browser, with the server-side service acting as a public-facing fallback for browsers not supporting workers

I don't want to introduce another circular dependency between MediaWiki and another service.

Is the technical difficulty in doing so that the request has been set up for $wiki and we need information from wikidata?

Well, I consider calling the MW api from a service called by MediaWiki an antipattern that we should absolutely avoid.

Oh, I got that wrong - I thought the service would be public facing and called directly from the client! But apparently, itt's not, it would be hidden behind api.php. So, my first point, "we would not expose a new endpoint we would not expose a new endpoint", is invalid. Instead, it should read "We should not introduce a service that is called by MediaWiki, and itself calls MediaWiki."

Addshore moved this task from incoming to in progress on the Wikidata board.Dec 19 2018, 12:43 PM

In https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service we see that "There is a server-side and the client-side variant of the code, which are distributions of the same implementation." Looking at the current repository, we see that it shares data models (https://github.com/wmde/wikibase-termbox/tree/master/src/datamodel) but that there is common, client-only, and server-only data-access logic. I think architecture decisions hinge on this separation and the reasons for it.

In the simplest case, this code would be almost identical client and server-side. No matter where it's running, nodejs or the browser, it would request data, receive it, and render it as html. Mediawiki and wikibase would be responsible for compiling that data in a nice way. The flow would be:

BROWSER --get-interface--> NODEJS --get-data--> APIs --return-data--> NODEJS --return-interface--> BROWSER

As I understand it, in the proposed architecture, mediawiki sits in between the browser and node to prevent exposing another public endpoint for the node service. Is my understanding correct?

Now, I started looking through the code and it looks like there's an effort to keep server and client logic as common as possible, with factories and interfaces and nice patterns, but there are still differences. It looks like there's good reason for this, but how would the client be able to act as a fallback for the server as proposed?

By the way the code looks good and I'm glad to see this work going forward.

Krinkle edited projects, added TechCom-RFC; removed TechCom.Dec 20 2018, 6:32 AM
Krinkle moved this task from Inbox to Backlog on the TechCom-RFC board.Dec 20 2018, 6:54 AM
Krinkle added a subscriber: Krinkle.

This task proposes a significant change to software architecture and should follow the RFC process. Tagging it as such.

I've also triaged it and after reading the description and linked wikipage, I believe the following should be clarified before TechCom can be effective in gathering and processing input from relevant stakeholders. Specifically:

  • A description of the "termbox" feature, how it currently works technically in production, and what its current requirements are.
  • A brief statement of what problem this proposal would solve (e.g. additional requirements you want to solve but currently can't).

If you'd like input or feedback on anything from TechCom at any point, feel free to move it back to the "Inbox" column on the TechCom-RFC workboard. You can also use the "Request IRC meeting" column to request an office hour on IRC about this RFC.

"We should not introduce a service that is called by MediaWiki, and itself calls MediaWiki."

Slightly OT, but a +1000 YES to this. Been there, seen that antipattern, it's a mess to reason about. The coupling of the 2 components make it near impossible to test/benchmark/debug the interactions. It also a mess to untangle and fix when it is identified.

@Milimetric wrote:

In the simplest case, this code would be almost identical client and server-side. No matter where it's running, nodejs or the browser, it would request data, receive it, and render it as html.

I think you are right, that does make sense; Then, the rendering service should sit between the client and MediaWiki, and not be called by MediaWiki. But that means it cannot be used for serving the default rendering of the page from index.php.

If we have these two requirements:

  1. use vue.js based rendering when serving page content from index.php
  2. the mechanism for accessing Wikibase content from vue.js based code should be the same on client and server

...then I see no way to avoid the "PHP calls JS calls PHP" issue in general.

But for the case at hand, there might be a workaround: the PHP code that renders the (Wikibase Entity) page content already knows what data will be needed for the rendering. It can send it to the rendering service along with the request to render. The vue.js code would then need to have a "fake repo request" facility that would just use data that was passed in with the original request, and would fail (or at least warn) when trying to load any additional content by calling the actual MediaWiki API. I think that solution would still be fairly clean, and would perform better than calling back to the MediaWiki API all the time.

Thinking further into the future, the dilemma could perhaps also be solved by separating MediaWiki's storage layout entirely from the application layer; this way PHP and JS code can use the same storage layer API for retrieving page content. But this is just a thought for the future - it woudl take a while, and I'm not even sure it's a good idea.

But for the case at hand, there might be a workaround: the PHP code that renders the (Wikibase Entity) page content already knows what data will be needed for the rendering. It can send it to the rendering service along with the request to render. The vue.js code would then need to have a "fake repo request" facility that would just use data that was passed in with the original request, and would fail (or at least warn) when trying to load any additional content by calling the actual MediaWiki API. I think that solution would still be fairly clean, and would perform better than calling back to the MediaWiki API all the time.

When rendering the page, index.php knows the exact data that needs to be rendered already, correct? If so, it can send that to the client, and then based on whether JS/SW is available or not, the client either renders it or sends a request to the service which sits behind Varnish. Or am I missing something.

Thinking further into the future, the dilemma could perhaps also be solved by separating MediaWiki's storage layout entirely from the application layer; this way PHP and JS code can use the same storage layer API for retrieving page content. But this is just a thought for the future - it woudl take a while, and I'm not even sure it's a good idea.

Somewhat OT for this task, but I would actually argue that the democratisation of the storage layer is a good thing, as it allows different entities to focus on wht they are supposed to achieve. Naturally, this implies the storage solution is scalable, secure, etc.

When rendering the page, index.php knows the exact data that needs to be rendered already, correct? If so, it can send that to the client, and then based on whether JS/SW is available or not, the client either renders it or sends a request to the service which sits behind Varnish. Or am I missing something.

That works, but defies the purpose. The idea is to present a default rendering to clients that don't have JS enabled (or no sufficiently current JS support). That rendering should be generated by the same vue.js code that does the rendering on the client.

Clients that do have JS support will ask the API for different data (based on user language settings, for logged in users) and will re-render the term box based on that. No server side rendering required.

That works, but defies the purpose. The idea is to present a default rendering to clients that don't have JS enabled (or no sufficiently current JS support). That rendering should be generated by the same vue.js code that does the rendering on the client.

But the same principle applies, regardless of whether the req is sent by the client or not. If index.php has all the data to render it, then that means that it can send it to the service directly without the need for the service to call MW back (now, whether the client will be used as a proxy is a different issue).

Clients that do have JS support will ask the API for different data (based on user language settings, for logged in users) and will re-render the term box based on that. No server side rendering required.

This confuses me. You first need to render the page on the server before you know whether the client supports JS/SW or not, so it will need to be rendered on the server irrespective of the client's capabilities in case where MW calls the service directly before handing out the page.

When rendering the page, index.php knows the exact data that needs to be rendered already, correct?

I just had a brief chat with @Jakob_WMDE and @Pablo-WMDE about this. For the current use case ("term box"), this would be easy, but for the anticipated generalized use case ("render entire wikibase entity") this is non-trivial: the template doesn't just need the entity data itself, but also the labels of referenced items, data types of properties, localized names of units, etc. Only the JS code really knows what it needs, and even it does not have that knowledge in a central place.

daniel added a comment.EditedDec 20 2018, 1:57 PM

You first need to render the page on the server before you know whether the client supports JS/SW or not, so it will need to be rendered on the server irrespective of the client's capabilities in case where MW calls the service directly before handing out the page.

Yes, that is correct, my statement was imprecise: there is no need for server side rendering of the personalized term box; that is done on the client. Only the general rendering with the default languages needs to be rendered on the server. This is also the version that is cacheable, intended to be included in the output of index.php.

(Actually, this is an assumption, it may not be true. Currently, we do render the personalized term box on the server for logged in users, and merge it into the output by substituting a placeholder; I assume that the plan is to not do that, and drop support for a personalized term box for people without JS. But I did not confirm this assumption)

Addshore updated the task description. (Show Details)Dec 20 2018, 4:14 PM

There sure has been a fair amount of discussion on this ticket!

So I have created an updated interacting diagram showing off a few more details of the overall flow (and updated the description)

This highlights the various levels of caching.

  • Calls for entity data will be going via varnish, and Special:EntityData is cacheable
  • Other calls to the mw / wb api will not be varnish cached but will be cached and reused by the service itself, probably be a TTL of 1 minute.

Load Details (now included in the description)

The initial responsibility of this service will be the rendering of the term box for wikidata items and properties for mobile web views.
Currently wikidata.org gets no more that 80k mobile web requests per day (including cached pages, and non item/property pages).
If we were to assume all of these requests were actually to item and property pages that were not cached this would result in this SSR service being hit 55 times per minute.
(In reality some of these page views are not to item or property pages, and some will be cached) so we are looking at no more than 1 call per second.

Replies to specific points, with notes from @Jakob_WMDE & @Pablo-WMDE

Oh, I got that wrong - I thought the service would be public facing and called directly from the client! But apparently, itt's not, it would be hidden behind api.php. So, my first point, "we would not expose a new endpoint we would not expose a new endpoint", is invalid. Instead, it should read "We should not introduce a service that is called by MediaWiki, and itself calls MediaWiki."

The initial deployment will not even have a proxy to the service via api.php, it will be entirely internal.
No part of the SSR service is public facing.

As I understand it, in the proposed architecture, mediawiki sits in between the browser and node to prevent exposing another public endpoint for the node service. Is my understanding correct?

Yes.

Now, I started looking through the code and it looks like there's an effort to keep server and client logic as common as possible, with factories and interfaces and nice patterns, but there are still differences. It looks like there's good reason for this, but how would the client be able to act as a fallback for the server as proposed?

This is an exceptional case and only works for user with JS enabled.
"Client" here would be the client-side JS, served by Wikibase.
In case there is no response from the SSR, the fallback would be CSR-only, i.e. it will only work for people with js enabled.
This addresses the rare scenario that there is no working SSR configured (in installations run by 3rd parties now WMF) and still offers termbox features, but at the cost of not being accessible to users with JS disabled.

If index.php has all the data to render it, then that means that it can send it to the service directly without the need for the service to call MW back (now, whether the client will be used as a proxy is a different issue).

The "termbox" is more of an application than a template.
Only it knows which data it needs - actively "sending" data to it requires knowledge of which information is needed.
While seemingly trivial in the beginning this will, as the application grows, become a burden in maintenance - and potentially in performance if data data that has become obsolete is sent "just to be sure".

(Actually, this is an assumption, it may not be true. Currently, we do render the personalized term box on the server for logged in users, and merge it into the output by substituting a placeholder; I assume that the plan is to not do that, and drop support for a personalized term box for people without JS. But I did not confirm this assumption)

Personalized = user language on top, then "languages most likely to be spoken" by user. We still strive for this behavior.
Cacheability will have to be taken into account.

Now, I started looking through the code and it looks like there's an effort to keep server and client logic as common as possible, with factories and interfaces and nice patterns, but there are still differences. It looks like there's good reason for this, but how would the client be able to act as a fallback for the server as proposed?

This is an exceptional case and only works for user with JS enabled.
"Client" here would be the client-side JS, served by Wikibase.
In case there is no response from the SSR, the fallback would be CSR-only, i.e. it will only work for people with js enabled.
This addresses the rare scenario that there is no working SSR configured (in installations run by 3rd parties now WMF) and still offers termbox features, but at the cost of not being accessible to users with JS disabled.

My question here was more, how can the client render everything it needs, when some of the logic, for example for data-access, is only in the src/server/data-access folder? In other words, if the client functionality is a super-set of the server, I would expect only common and client folders, and the server to use logic out of the common folder. I'm asking both for the termbox right now and plans for this service in the long-run.

My question here was more, how can the client render everything it needs, when some of the logic, for example for data-access, is only in the src/server/data-access folder? In other words, if the client functionality is a super-set of the server, I would expect only common and client folders, and the server to use logic out of the common folder. I'm asking both for the termbox right now and plans for this service in the long-run.

I hope I understand the question correctly, but the idea is that e.g. src/server/data-access and src/client/data-access both implement client/server-specific functionality of the same data-access interfaces. CSR gets all its information from the window environment (e.g. mw.config, existing Wikibase JS services, etc), whereas the SSR requests the data from the API. The client functionality is not a super-set of the server. They share most of the code, and what they cannot share is hidden behind interfaces and implemented in src/client and src/server respectively.

Thanks @Jakob_WMDE, I think we're saying the same thing in slightly different terms, and it's because I'm not being precise. It's ok for the client and server to have different implementations, but you're saying the have the same capabilities, right? I was thinking that for the client to be able to render everything the server does, plus handle interactivity and other features in the future, its capabilities would have to be a superset of the server capabilities, right? So, there's no html that could only be rendered by the server, right? It seems that way, components and interface are all shared.

If so, great. If not, it feels important to understand what happens when the server's not there and the client doesn't have some of the server-specific functionality.

So then for me the main question remains around the request flow that everyone else is discussing. From @Addshore's response it sounds like you considered this option and decided against it:

  • the request to index.php is conditionally routed directly to the SSR service. In our world, the SSR service is there, so we configure it in Varnish, it returns html, and Vue takes over client-side. For other mediawiki installations, index.php knows to render a basic version of the html which pulls in the Vue.js modules. Once this loads in the browser, it renders the interface.

If this was rejected, I'm just curious, what were the expected problems? One potential optimization with that approach could be, if SSR is enabled, you send a smaller client-side module that doesn't include data-access which would only ever happen once, on page-load. So you could put that in a client-fallback folder or something, include it in the version served by index.php but exclude it from the version served by SSR.

(by the way, the build is nicely separated for client vs. server here https://github.com/wmde/wikibase-termbox/blob/faf3dcca602da9f2287e8be9acaa1023144b23d7/vue.config.js#L14, where webpack analyzes the code and includes only what's required by the particular environment it's building against)

the request to index.php is conditionally routed directly to the SSR service. In our world, the SSR service is there, so we configure it in Varnish, it returns html, and Vue takes over client-side. For other mediawiki installations, index.php knows to render a basic version of the html which pulls in the Vue.js modules. Once this loads in the browser, it renders the interface.

@Milimetric I am not sure if I get the suggestion right, but it sounds like almost what we are proposing, with the only difference that it would include some server side rendering of the Wikidata parts of page in PHP ("rendering a basic version of HTML").
We've decided to not do this because one of our goals is to have the single implementation of the rendering logic. That's why the rendering code is in JS/node. Having a second implementation of rendering in PHP, including rendering vue templates in PHP etc, is the solution we didn't want to lock ourselves in, as it basically asking for trouble (forgetting to update a second place when updating the other one is bound to happen at some point, for example).
Or maybe you meant, that without the SSR in place, the server will only render some kind of placeholder, and once vue js is loaded on the client, the UI will be rendered. This would mean a less pleasant user experience (UI "appearing" later), but would still provide the full experience to the user, as long they have JS enabled in their browsers. If this is your suggestion, I am happy to confirm that this is the approach we're planning here.

I believe answers from WMDE staff above have already been touching on this topic, but to explicitly mention them, let me try to answer it:

"We should not introduce a service that is called by MediaWiki, and itself calls MediaWiki."

The intention of introducing the service is not to have a service that call Mediawiki. As discussed above, it is needed for the service to ask for some data, and this data shall be provided by some API. Currently, the only API that can provide data about Wikidata items etc is indeed wrapped in MediaWiki. Should there be another, more light-weight API providing access to the same data, the service would most likely be using it.
The implementation, as can be seen by looking at the linked source code, is not bound to those APIs it talks to be MW-ones at all.
If between lines people are here suggesting to have some lighter API to access Wikidata data, I can only second those ideas. Introducing such API(s) seemed to be too much of an endeavour to do together with introducing new front-end solution. We preferred to take one step at a time.

Joe added a comment.Dec 21 2018, 10:31 AM

Let me state it again: the SSR service should not need to call the mediawiki api. It should accept all the information needed to render the termbox in the call from mediawiki.

So we should have something like:

  • Mediawiki makes a POST request to SSR, sending the entity data, the information of the languages *and* all the messages
  • The service transforms said data into HTML, sends it back to mediawiki

This has several advantages over the proposed solution:

  • Only one RPC call, instead of several (at least 3 AIUI)
  • No need for costly change propagation to get cache invalidation for the service
  • No need for caching in the service, even
  • Performance is going to be much better given we don't have to instantiate the mediawiki request context several times
  • The design is more resilient as the service will not depend on any backend in order to work

I would consider switching to this model a necessary condition for a production deployment.

@Addshore @WMDE-leszek do you see any reason why this wouldn't work?

Joe added a comment.Dec 21 2018, 10:36 AM

The "termbox" is more of an application than a template.
Only it knows which data it needs - actively "sending" data to it requires knowledge of which information is needed.
While seemingly trivial in the beginning this will, as the application grows, become a burden in maintenance - and potentially in performance if data data that has become obsolete is sent "just to be sure".

An application is something that provides a public-facing, standalone functionality. This doesn't seem to be the case.

The only difference here would be moving some business logic to the MediaWiki extension instead than inside the service.

The burden in maintenance is there if we keep building circular dependency loops between services, it will only be moved onto the people running the service in production instead than on deployment coordination.

Joe added a comment.Dec 21 2018, 10:40 AM

Also, if we're going to build microservices, I'd like to not see applications that "grow", at least in terms of what they can do. A microservice should do one thing and do it well. In this case, it's using data from mediawiki to render an HTML fragment; unless you want to make it do something different, the thing that might change is what data it needs to use.

Joe added a comment.Dec 21 2018, 10:48 AM

The intention of introducing the service is not to have a service that call Mediawiki. As discussed above, it is needed for the service to ask for some data, and this data shall be provided by some API. Currently, the only API that can provide data about Wikidata items etc is indeed wrapped in MediaWiki. Should there be another, more light-weight API providing access to the same data, the service would most likely be using it.

Apart from the fact that the MediaWiki api is very performant, I don't think that "the data shall be provided by some API". The data might as well be provided by the caller. I can assure you it's absolutely impossible to get better performance than being fed the data by your caller, vs having to retreive it with several RPC calls, no matter how lightweight (which I guess means "performant") the API you call might be.

To avoid misunderstandings: I was not questioning MediaWiki's action API being performant. By "lightweight" I was referring to "PHP has high startup time" point @daniel made above as one of the reason why no service should call MW API.

Joe added a comment.Dec 21 2018, 11:12 AM

To avoid misunderstandings: I was not questioning MediaWiki's action API being performant. By "lightweight" I was referring to "PHP has high startup time" point @daniel made above as one of the reason why no service should call MW API.

I read @daniel's comment as "why do that costly operation several times instead of doing it once", which makes sense to me. Also sorry if I came across a bit strong, but there has been a lot of FUD spread on the topic of MW API performance, and I wanted to dispel that myth :)

@WMDE-leszek ok, we're on the same page, except the crazy part of my proposal. I was saying directly routed to SSR service as in, without ever hitting mediawiki and spinning up the mediawiki context. So this would expose SSR publicly. The fallback stub HTML generated by mediawiki would work as you understood. Was this ruled out, routing directly to SSR?

@Addshore can you go more in depth about why the SSR service is the only one that knows what data it needs? Would it be possible to factor out the code that compiles that data and implement it anywhere else, or is it for some reason tightly coupled with the interface rendering?

daniel added a comment.EditedDec 21 2018, 3:15 PM

@Joe said:

the SSR service should not need to call the mediawiki api. It should accept all the information needed to render the termbox in the call from mediawiki.

After talking to the Wikidta folks, I realized that this is not easy at all. It requires the callingg code to know what data is needed, and that depends on implementation details buried in several places of the rendering code, possibly different for entity types defined by different extensions, etc. For the term box alone, "send the entity" will work. For rendering more (e.g. all the statements), this will not work.

@Milimetric said:

I was saying directly routed to SSR service as in, without ever hitting mediawiki and spinning up the mediawiki context.

That doesn't work, the goal is to deliver a full page to the client, with all the skin chrome. This is for index.php serving a full page.

So, overall, it seems like the solution proposed by the Wikidata team is the only one viable at this time. I'm not very happy about this, but I don't really see an alterrnative. Doing It Right (tm) would require MediaWiki to have either the presentation layer or the storage layer split off. That'll have to wait a couple of years.

The one viable improvement that was raised would require ESI (or a similar mechanism): the call to index.php would return HHTML that contains a placeholder for the termbox, which would be resolved in the edge by a callback to the SSR service, which could then call back to MediaWiki as needed, without a circular dependency. But IIRC, @Joe does like ESI either...

CDanis added a subscriber: CDanis.Dec 21 2018, 3:21 PM

So, overall, it seems like the solution proposed by the Wikidata team is the only one viable at this time. I'm not very happy about this, but I don't really see an alternative. [..]

I'll look into this RFC in more detail at a later time, but at glance, this does not seem fair.

As currently presented, it appears this RFC is lacking a problem statement. It isn't solving a product need, user need, or technological need. Rather, it starts out on the assumption that we're going to have UI code in production (based on Vue.js) written in a way that contains too much business logic in its templating code.

If we're talking about a new approach for Wikidata front-end, I think it makes sense to generalise the acceptance criteria to the larger problem being solved. If we keep the above (seemingly artificial) restriction in place then, in my opinion, there is no room for an RFC conversation to take place.

@Krinkle said:

Rather, it starts out on the assumption that we're going to have UI code in production (based on Vue.js) written in a way that contains too much business logic in its templating code.

That is correct: the Wikidata team made the determination that they want to use vue.js for the frontend. The idea is to use a standard framework that allows rich interactions, and to use the same code base for user interaction on the client, and for rendering the initial static view. This is opposed to the current situation, where the static rendering is done inn PHP, and interaction is implemented in JS, with static templates shared between both mechanisms. This has proven extremely cumbersome and inflexible, this was alreaddy a problem when I was still the tech lead of the wikidata team. Using vue.js with some sort of server side execution mechanism has been proposed an discussed by the Wikidata team for about two or three years, at hackathons and summits. We have also discussed it at TechCom, though not as an RFC I think. In essence, they were always told to go ahead. Going back on that now doesn't seem right not me, and I don't see a viable alternative (other then React perhaps, which would pose the exact same problem).

I agree that it would be good to have the motivation for using vue.js documented, along with the alternatives considered and trade-offs evaluated. But this does not come out of the blue. This has been in the pipeline for years, and every effort was made to communicate with various WMF teams about this effort.

I agree that we can't go back on decisions that are 3 years in the making. But I do like Timo's point that we should state the problem. Here's an attempt:

"Implementing interaction in JS on top of static html rendered by PHP is inflexible and complicated. It leads to code that is hard to maintain, test, or improve. This is because two different code bases which require developers with different skill sets have to coordinate any changes."

Feel free to adapt that to better describe the problem ^. @Krinkle, if something like that was made part of the RfC, would that be a good step?

herron triaged this task as Normal priority.Jan 2 2019, 9:06 PM

Thanks everyone for comments so far. This ticket in its current state is definitely not a ready RFC, you're right. We're going to turn it into one/create a separate RFC ticket in upcoming days.
As a preparation, we're going to have a little chat with @Joe on Monday, to talk about our plans, and see what elements of our plan are particularly not clear/problematic. Things are clear in our heads, but this does not mean it is all obvious to other people :) Interested CET timezone people are welcome to join of course.
This talk is of course is not meant as a replacement to the RFC review process.

fsero added a subscriber: fsero.Jan 3 2019, 5:00 PM
daniel added a comment.Jan 8 2019, 2:58 PM

Did the chat with @Joe happen? What was the outcome?

jijiki added a subscriber: jijiki.Jan 8 2019, 3:51 PM

It did (today, not on Monday though). I hope the outcome is I hope that @Joe and @akosiaris have a better understanding of what are we have in mind. What we talked about (Wikibase front-end architecture) is also going to be turned into an RFC in next 24 hours.

WMDE-leszek changed the task status from Open to Stalled.

I've submitted RFC about the whole concept of Wikibase front end changes as T213318. I've taken the liberty to subscribe all people who were kind to comment on this task to the RFC.
This ticket was intended as the "pure" service request, hence removing the TechCom-RFC tag. Also marking as stalled for now, to focus on the RFC ticket first, as the service request has little point without the general approach of ours being discussed first.

I've tried to read all of it and maybe I've missed something, but I am still not sure what added value having such separate service gives us. We are creating pretty complex patten of interaction between Mediawiki and outside service, and I am not sure why this service is better than just having code inside Mediawiki to do the same. Is it caching? But we can do caching inside Mediawiki. It is ability for partial rendering HTML? But we can have partial-render API inside Mediawiki (and as I understand, that's how it going to be called by the front-end UI anyway?). Is it supplying data for SPA written in Javascript for doing the rendering on the client? But Mediawiki API is completely capable of returning JSON as well as returning HTML. I am not sure I really understand - what is the added value of the external service here?

If the idea is to have stateless (QID,language)->HTML renderer, with expectation that the HTML would be highly cacheable - is it true that it is? And if so, is it true that the best way to cache it is creating external service?

Please excuse me if I missed some important part - there's a lot of text to read :) If there's an answer for this already, please feel free to point me to it.

@Smalyshev No reason to be sorry for asking the right questions!
If we truly wanted* to boil the reason down to one sentence: The reason this is a dedicated service is the language it is written in (typescript), which was chosen because it allows us to create an implementation which can be compiled/transpiled to work on both server and client - something not possible with PHP.
Please see T213318 for (hopefully) more pointers in that regard (ctrl+f "avoid redundant implementations").

(* with the know consequences in complex discussions)

The reason this is a dedicated service is the language it is written in (typescript), which was chosen because it allows us to create an implementation which can be compiled/transpiled to work on both server and client - something not possible with PHP.

For the concrete use case of the Wikidata term box: The term box is an interactive element UI. But the initial rendering should happen on the server (to avoid jumping and delays, and also to provide a view for clients with no JS). If both renderings should use the same code, we need to somehow run the JS code on the server.

Lea_WMDE moved this task from Backlog to Other on the Wikidata-Termbox-Hike board.Mon, Feb 11, 8:49 AM

I get the idea of server-side HTML rendering to avoid delays. But I am kinda questioning whether the advantage of splitting code outside of PHP into separate service and all the complexities that follow from that.

Maybe there's an easier way to achieve the same benefit - e.g. using some kind of template engine that exists in both JS and PHP, so we can be reasonably sure that if template is the same the output is the same? We're not talking Wikitext-class complexity here, we'd be controlling the templates and the box itself seems to be pretty basic HTML. So I wonder whether it's worth such complications just so it always be run in Javascript...

@Smalyshev I believe the approach we are suggesting really makes a difference when thinking beyond just rendering a template for a particular part of an Item page. I can definitely be blamed for not making it clear enough in the description of this task that we are NOT intending to build the template renderer, but the UI application which determines what data to load (from what APIs), what actions (APIs) to call etc, and also what to render. Also, it should be noted the ultimate goal of ours is not to just change the way this particular element of the item page (i.e. the "termbox") is handled in the Wikibase front end code, but it is about the Wikibase front end as a whole, with termbox being the first step only.

The overall approach/concept is more precisely (I hope) described in the RFC ticket T213318, which also might also be a better place to raise questions like the one of yours.

Thanks, T213318 makes it a bit clearer though not entirely 100% clear which parts stay in PHP and which parts move to JS. Would it also be true that there is no way to render Wikibase content (even without editing) on non-Javascript browsers? The SPA approach in T213318 suggests that any Wikibase interaction - including merely displaying http://www.wikidata.org/wiki/Q42 - requires SPA being started? Or we keep maintaining parallel PHP renderer to SPA/JS renderer?

daniel added a comment.EditedThu, Feb 14, 7:28 AM

@Smalyshev My understanding (which may be dated or incomplete) is this: there would be no PHP rendering, JS rendering would need to happen either in the browser on on the server. If the server supports SSR (e.g. for Wikimedia projects), you can read with a non-JS browser. If the server does not support SSR (e.g. on shared hosting), then you need a JS enabled browser to read.

WMDE-leszek added a comment.EditedThu, Feb 14, 8:48 AM

We do not intend to maintain the "proper" UI logic in PHP. SSR service will render the page on the server side which will (via MediaWiki etc) make it to reader's browser. We do want Wikidata readers and editors have an access to data even if they decide to disable JS in their browser, hence the request for this service.

Addition: we might likely end up doing some "rudimentary" rendering in PHP in case there is no node SSR service, or it is unavailable, to avoid having a complete blank page before the JS loads. But as we are to gradually migrate, this is not going to be the issue from the very start, as we only "replace" the part of the item/property page in the first step.