Page MenuHomePhabricator

Add book search functionality to mainspace books in Wikisource
Open, Needs TriagePublicFeature

Description

refer to Search in Books proposal in Community Wishlist 2022

The plan is to start of with implementing a basic OOUI search form in the indicator section with a popup that allows users to configure specific options (such as regex, search as text search, search by Index, search by book).

We can explore adding more options and exposing more complex search options later.

Event Timeline

Aklapper renamed this task from Add book search functionality to manispace books in Wikisource to Add book search functionality to mainspace books in Wikisource.May 30 2022, 9:36 AM

Change 801372 had a related patch set uploaded (by Sohom Datta; author: Sohom Datta):

[mediawiki/extensions/ProofreadPage@master] Replace proofreadpage_source_href with prpSourceIndexPage and build source link client side.

https://gerrit.wikimedia.org/r/801372

Change 801392 had a related patch set uploaded (by Sohom Datta; author: Sohom Datta):

[mediawiki/extensions/Wikisource@master] Add book search to Wikisource mainspace pages

https://gerrit.wikimedia.org/r/801392

Change 801372 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Replace proofreadpage_source_href with prpSourceIndexPage and build source link client side.

https://gerrit.wikimedia.org/r/801372

Change 801372 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Replace proofreadpage_source_href with prpSourceIndexPage and build source link client side.

https://gerrit.wikimedia.org/r/801372

This change removes the proofreadpage_source_href JS variable from mw.configand replaces it with a new variable prpSourceIndexPage. proofreadpage_source_href used to contain the HTML link to the source Index: page for a mainspace page. However, to access the actual Index Page name, a userscript/JS feature would have to parse the HTML to extract the data. prpSourceIndexPage will contain only the name of the associated source Index: page eliminating the need for any HTML parsing. If any userscripts use this feature, they should migrate to using prpSourceIndexPage.

This change removes the proofreadpage_source_href JS variable from mw.configand replaces it with a new variable prpSourceIndexPage.

This leaves no sane migration path since it creates a bright-line cutoff; which is unnecessary since these are two separate variables. Either both should be left in indefinitely, or they should co-exist for some reasonable amount of time so scripts can be migrated smoothly.

Whether to leave the existing variable indefinitely or temporarily depends on whether any scripts actually use the "HTMLness" of it for anything. Like you I suspect most scripts just transform it back into a page name, but it's absolutely not unthinkable that some script somewhere actually wants the HTMLy version for some reason.

Incidentally, this change seems to have some sort of bearing on T53980 (from 2013!), though I'm not sure of in what way yet (makes it easier? harder? fixes it as an incidental effect? makes on-wiki workarounds easier?).

Incidentally, this change seems to have some sort of bearing on T53980 (from 2013!), though I'm not sure of in what way yet (makes it easier? harder? fixes it as an incidental effect? makes on-wiki workarounds easier?).

I don't think this change will affect the Translation: namespace at all, since we filter only main namespace pages before we add these variables/information.

This change removes the proofreadpage_source_href JS variable from mw.configand replaces it with a new variable prpSourceIndexPage.

This leaves no sane migration path since it creates a bright-line cutoff; which is unnecessary since these are two separate variables. Either both should be left in indefinitely, or they should co-exist for some reasonable amount of time so scripts can be migrated smoothly.

Whether to leave the existing variable indefinitely or temporarily depends on whether any scripts actually use the "HTMLness" of it for anything. Like you I suspect most scripts just transform it back into a page name, but it's absolutely not unthinkable that some script somewhere actually wants the HTMLy version for some reason.

It should be fairly easy to reconstruct a HTML link from prpSourceIndexPage. We use the following code for a JS module in ProofreadPage to handle this scenario:

       var urlLink;
	if ( mw.config.get( 'prpSourceIndexPage' ) ) {
		urlLink = mw.html.element( 'a',
			{
				href: mw.util.getUrl( mw.config.get( 'prpSourceIndexPage' ) ),
				title: mw.msg( 'proofreadpage_source_message' )
			},
			mw.msg( 'proofreadpage_source' ) );
	} else {
		urlLink = mw.config.get( 'proofreadpage_source_href' );
	}
       // use urlLink to do something

Ok, I've checked all usages on enWS (and noWS, incidentally) and apart from user scripts for inactive users, there is only one remaining instance. I've notified the user, but I suspect the script isn't actually in use currently (it hasn't been modified since 2011). In other words, enWS should be fine when the train rolls around tomorrow. But I'm still a little worried about the other projects.

I don't think this change will affect the Translation: namespace at all, since we filter only main namespace pages before we add these variables/information.

Hmm. But why? The config var is potentially relevant in all namespaces (getting the Index: from the Page: namespace would seem a common enough need), and as per T53980 the Translation: namespace is a content namespace entirely equivalent to ns:0.

Ok, I've checked all usages on enWS (and noWS, incidentally) and apart from user scripts for inactive users, there is only one remaining instance. I've notified the user, but I suspect the script isn't actually in use currently (it hasn't been modified since 2011). In other words, enWS should be fine when the train rolls around tomorrow. But I'm still a little worried about the other projects.

Global search appears to be down right now, but almost all uses of the variable on-wiki are with the PageNumbers.js script (which appears to be tagged as historical + some userscripts)

I don't think this change will affect the Translation: namespace at all, since we filter only main namespace pages before we add these variables/information.

Hmm. But why? The config var is potentially relevant in all namespaces (getting the Index: from the Page: namespace would seem a common enough need), and as per T53980 the Translation: namespace is a content namespace entirely equivalent to ns:0.

We do add a lot of variables to the Page: namespace in other code paths of ProofreadPage. For the Page: namespace, prpIndexTitle is the variable you want to use to get the Index: page (there are a bunch of other ones as well I don't know if they are documented on-wiki). Wrt to the Translation: namespace, I'm not very sure why the source link and values are not added, but it seems like it should be a pretty easy fix.

… For the Page: namespace, prpIndexTitle is the variable you want to use to get the Index: page …

Then why use a different config var name in mainspace?