To avoid loading the Wikispeech JS for users not wishing to use it we should create an extremely small JS component (in a separate file) which simply adds an interactive component (like a button). Once interacted with we move on to load the full JS and start the segmenting.
While the design and accessibility considerations of the interactive component will require more consideration the basic logic should be possible to implement straight away.
This smaller js should still respect T243393: Only load Javascript when it may be used and only load when needed (whitelisted namespace, supported pagelanguage and current revision).
The "always enable" user setting should bypass the interactive component and load the full JS straight away. Conversely the visual elements of the interactive components should at the very least support being hidden by a minimal css addition to you user css (or a gadget)