Page MenuHomePhabricator

Improve GuidedTour performance
Open, LowPublic

Description

Its code base not kept up with current best practices. I consider it a prerequisite for any new use/deployment of the extension, for it to first have an active steward, be updated to current best practices, and specifically to have a significantly better startup performance footprint for regular page views.

Its currently structured in a way that requires each tour to be registered as its own top-level module bundle, which does not scale (incompatible with current guidelines).

Some ideas:

  • one way would be to have the tour code go into one big module, like messageposter. you'd have an extension registry attribute like "GuidedTourModule", and then the guidedtour extension takes those, merges them, and registers them as one file module. This would mean 'scripts', no packageFile module since those cannot easily be merged as simple PHP arrays. see MessagePosterModule for an example of that
  • Another way would be to expand the configuration to take a module name and property name. This would be much simpler, but only solves the problem if the extension isn't perf-sensitive and has another module you can throw the code into. E.g. for CheckUser, you could throw the code in a random module, and then tell guided tour to get it from require('ext.checkUser.stuff').tourA with some JSON declaration
  • Similar to #2, we could invert the way GT works and make individual extensions responsible for loading the tours instead of making GT do it. E.g. centralnotice special page or hooks would queue what they need, and their own code would load and/or construct whatever they need to do, like you would for any other JS on the page.
  • Solve T225842 first, and make these private modules, automatically merged by RL into one "ext.guidedTours.tours" bundle

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

one way would be to have the tour code go into one big module, like messageposter. you'd have an extension registry attribute like "GuidedTourModule", and then the guidedtour extension takes those, merges them, and registers them as one file module. This would mean 'scripts', no packageFile module since those cannot easily be merged as simple PHP arrays. see MessagePosterModule for an example of that

Wouldn't this mean that, any time you wanted to launch a single tour, the javascript for all registered tours would need to be sent to the client, since it would also be a part of the same module? This isn't as big of a deal for message poster, since its just the core version and flow's, that are currently available, but, if we had a single module for all deployed tours, it would include

  • from the GuidedTour extension: firstedit, test, onshow, uprightdownleft
  • from GrowthExperiments: helppanel, homepage_mentor, homepage_welcome, homepage_discovery
  • from GettingStarted: gettingstartedtasktoolbar, gettingstartedtasktoolbarve
  • from CheckUser: checkuserinvestigateform, checkuserinvestigate
  • from Flow: flowOptIn
  • from WikimediaMessages: RcFiltersIntro, WlFiltersIntro, RcFiltersHighlight

one way would be to have the tour code go into one big module, like messageposter. you'd have an extension registry attribute like "GuidedTourModule", and then the guidedtour extension takes those, merges them, and registers them as one file module. This would mean 'scripts', no packageFile module since those cannot easily be merged as simple PHP arrays. see MessagePosterModule for an example of that

Wouldn't this mean that, any time you wanted to launch a single tour, the javascript for all registered tours would need to be sent to the client, since it would also be a part of the same module? This isn't as big of a deal for message poster, since its just the core version and flow's, that are currently available, but, if we had a single module for all deployed tours, it would include

  • from the GuidedTour extension: firstedit, test, onshow, uprightdownleft
  • from GrowthExperiments: helppanel, homepage_mentor, homepage_welcome, homepage_discovery
  • from GettingStarted: gettingstartedtasktoolbar, gettingstartedtasktoolbarve
  • from CheckUser: checkuserinvestigateform, checkuserinvestigate
  • from Flow: flowOptIn
  • from WikimediaMessages: RcFiltersIntro, WlFiltersIntro, RcFiltersHighlight

It depends on their complexity and size. They don't have go into a single bundle per-se. The QUnitTestModule, for example, also follows the above principle, but results in one combined bundle per extension. It's an implementation detail.

I also note that it might actually be fine to have all these in one bundle. From what I recall, guided tour definitions generally aren't particularly complex. It would be a small module even with all the above combined, if we don't include their dependencies. Those can be lazy-loaded (and the tours are lazy-loaded anyway, so this wouldn't make much difference), by declaring those inside the file using mw.loader.using() thus only distributing the code in one bundle.

Another approach I mentioned before that might be worth exploring is to bundle tours with the feature they are for. E.g. Flow tour can be part of an existing Flow module. It doesn't need to be connected to GuidedTours at all (central or not).

The main work that needs to happen for this task is to decouple how tours are registered on the server-side. Currently the design of this registration pattern mandates the existence of a separate module bundle. Once that is decoupled, we can go in many different directions, possibly multiple based on locally-made decisions for that is optimal for those teams and features.

Change 713514 had a related patch set uploaded (by DannyS712; author: DannyS712):

[mediawiki/extensions/GuidedTour@master] Add custom ResourceLoader module to bundle multiple tours together

https://gerrit.wikimedia.org/r/713514

Change 713515 had a related patch set uploaded (by DannyS712; author: DannyS712):

[mediawiki/extensions/CheckUser@master] Use new GuidedTour bundle feature

https://gerrit.wikimedia.org/r/713515

one way would be to have the tour code go into one big module, like messageposter. you'd have an extension registry attribute like "GuidedTourModule", and then the guidedtour extension takes those, merges them, and registers them as one file module. This would mean 'scripts', no packageFile module since those cannot easily be merged as simple PHP arrays. see MessagePosterModule for an example of that

It wasn't easy, but I figured out how to do it with package files, which will make it easier to use tours that rely on multiple files. Patch for GuidedTour uploaded, as well as CheckUser to demonstrate it working

Is Performance-Team willing to review?

We can review/validate the performance improvement and help with the bundling logic if needed. However I'd recommend we involve Growth team as owners of this and not make a major change without them deciding on when and how to roll it out plus any guidance on testing/roadmapping etc. Also if there are people on their team interested in learning more about this, then the review would be a good knowledge sharing oppertunity.

kostajh moved this task from Inbox to Needs Discussion on the Growth-Team board.
kostajh subscribed.

Growth-Team can discuss it, but I'm not sure that we'll be able to give it a lot of attention with everything else we have going on.

@DannyS712 one thing we're concerned about is the potential to break existing tours, though hopefully we would get early warnings of that during QA and then during the train rollout. Do you have any other thoughts on that? I think we can give some time to code review but it might be a bit slower cycle, with the other things we have going on.

cc @phuedx who also has some comments from the patch in progress, AIUI.

@DannyS712 one thing we're concerned about is the potential to break existing tours, though hopefully we would get early warnings of that during QA and then during the train rollout. Do you have any other thoughts on that? I think we can give some time to code review but it might be a bit slower cycle, with the other things we have going on.

cc @phuedx who also has some comments from the patch in progress, AIUI.

This should not break any existing tours, demonstrated by leaving the ext.guidedTour.tour.uprightdownleft tour as its own module registered via extension.json and you can test and see it still works. This simply adds an extra way to register a tour that does not need a dedicated module for each tour.

@DannyS712 one thing we're concerned about is the potential to break existing tours, though hopefully we would get early warnings of that during QA and then during the train rollout. Do you have any other thoughts on that? I think we can give some time to code review but it might be a bit slower cycle, with the other things we have going on.

cc @phuedx who also has some comments from the patch in progress, AIUI.

This should not break any existing tours, demonstrated by leaving the ext.guidedTour.tour.uprightdownleft tour as its own module registered via extension.json and you can test and see it still works. This simply adds an extra way to register a tour that does not need a dedicated module for each tour.

Thanks. We'll plan to get to reviewing this once our Image-Suggestions project wraps up.

While I was looking for queries in common requests, it turned out that GT makes db queries in all load.php calls which is coming from this: https://gerrit.wikimedia.org/g/mediawiki/extensions/GuidedTour/+/0a2a5eb1e772b3afefeb505a402070170a5ec1c6/includes/GuidedTourHooks.php#40

	 */
	public static function onResourceLoaderGetConfigVars( &$vars ) {
		$pageName = wfMessage( 'guidedtour-help-guider-url' )
			->inContentLanguage()->plain();
		$pageTitle = Title::newFromText( $pageName );
		if ( $pageTitle !== null && $pageTitle->exists() ) {
			$vars['wgGuidedTourHelpGuiderUrl'] = $pageName;
		}
		return true;
	}

Either wrapping this with an hour long APCu cache or removing $pageTitle->exists() would help a lot. I let the team decide on this but this should be a rather good low-hanging fruit making js calls much faster. XHGui call: https://performance.wikimedia.org/xhgui/run/symbol?id=61d70087f6d157dc50ec5a00&symbol=GuidedTourHooks%3A%3AonResourceLoaderGetConfigVars

Bundling every single tour in existence (possibly with dependencies and whatnot) in a single package will scale poorly. I'd recommend an alternate approach.

There are three distinct use cases here:

  1. Tours triggered via the 'tour' query parameter or the cookie GuidedTours uses to keep track of in-progress tours, in which case GuidedTours needs to know how to load the tour.
  2. Tours triggered by server-side application code of the extension the tour belongs to.
  3. Tours triggered by client-side application code of the extension the tour belongs to.

In cases 2 and 3 the application might have a more effective way of loading the tour (e.g. it might want to bundle it together with the feature the tour is about, or in the client-side case it might want to preload it so it can be triggered more instantly).

So I think the goal is to have a logical tour identifier which can 1) trigger an already-loaded tour 2) optionally can be associated with a loading mechanism (that is more granular than requiring tours to be standalone modules). The straightforward way to that, IMO, is to add

  • An extension.json attribute which maps logical tour identifiers to module names. This would only be needed for tours making use of the query parameter or the cookie.
  • A registration system where a JS callback can be associated with a logical tour identifier. (For B/C, when nothing is registered, it would fall back to the logic that mw.guidedTour.internal.loadTour() currently uses.) Typically registration would be done in the index.js file of the module that contains the tour, and the callback would consist of requiring the tour file.
  • A mechanism for server-side code to queue a tour to be started (via the registration system) on the client side.

In the first use case, the server side GuidedTours logic would use the tour name => module name map to load the appropriate RL module (for B/C, it would fall back to using the tour name as a module name if nothing is registered) and queue it for execution. In the second use case, extension code in e.g. BeforePageDisplay would do the same. In the third, client-side logic would just invoke the registered callback directly, and it would be the calling code's responsibility to make sure the module is loaded. (If it wants to avoid that, e.g. it wants to export the tour so that client-side code in a different extension can trigger it, and doesn't want to expose internal details such as module structure, loading the module can always be made part of the registration callback.)

The registration system would only include tours which have been loaded, and the tour name => module name map would not need to be exported to the client side, so this would have zero overhead for which do not involve showing tours, and would not be opinionated about tour code organization at all. A cross-extension shard tour bundle managed by GuidedTours could be easily implemented on top of this with a second extension.json attribute, if there are extensions which e.g. don't have client-side code at all and don't want to create a module for just the tour.

While I was looking for queries in common requests, it turned out that GT makes db queries in all load.php calls which is coming from this: https://gerrit.wikimedia.org/g/mediawiki/extensions/GuidedTour/+/0a2a5eb1e772b3afefeb505a402070170a5ec1c6/includes/GuidedTourHooks.php#40

Same issue as T298751#7615404: either LinkCache is broken, in case we have bigger problems (but I'm sure we'd notice that) or this will only make a memcached call (which might still be worth avoiding on every load.php?modules=startup call, but much less of a problem).

That said, I'm not sure if the hook in question is used at all. The doc comment says it's needed because of T27349: ResourceLoader: Support loading of messages in parsed formats (e.g. parsed, incontentlanguage, ..) but the message is parsed with plain() so there should be no problem doing it on the server side. And the tour it is used in seems like some kind of demo?

Nevermind, the variable is used for in-content-language message lookup. I suppose the code was written back when RL did not support callbacks?

wgGuidedTourHelpGuiderUrl is used in a couple on-wiki tours but the only one which isn't mindless copy-paste from the example tours is de:MediaWiki:Guidedtour-tour-test2.js, which seems like a test tour. So I don't think it will be missed.

Urbanecm_WMF changed the task status from Open to In Progress.Jan 31 2022, 11:34 AM

Waiting for feedback on the patch, no activity in a while so removing from our current sprint board.

Pppery changed the task status from In Progress to Open.May 10 2024, 1:45 AM