Page MenuHomePhabricator

Support Index: page(s) based on external base URL, as opposed to being based on local (or Commons hosted) media
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):

On English Wikisource there is a user-side script which enables the use of hi-res scans (currently from Hathi Trust/ Internet Archive) to be used for Proofreading instead of a locally hosted file.

See- https://en.wikisource.org/wiki/User:Inductiveload/jump_to_file#Loading_high-res_images

What is desired is the same functionality as this script provides without the need for the given media (PDF/DJVU etc) to be locally hosted, by specifying a URL "type" in the UI provided interface, in conjunction with a Base URL, which will be used to access pages from a specific work.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Transcription of documents and works hosted on sites like Internet Archive/ Hath Trust/ National Library of Scotland.

Faster transcription potential for GLAM institutions that do not have the resources to perform bulk uploads on Commons , but have a 'common' URL scheme for resources hosted by themselves directly. If URL dervied Index pages were supported, such institutions would be able to set up such Index pages directly on Wikisource (albiet linking to scans they host as opposed to locally hosted ones.)

Benefits (why should this be implemented?):

  • Use of existing high quality digitisations without the need of an intermediate step (and potential loss of scan quality).
  • Faster creation of initial Index pages for works at relevant Wikisource by GLAM partners.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

That user script page also says explicitly that "Some parts of this script make requests to third parties, which can leak your personal information, such as IP and user-agent, as well as the work/page you are looking at".
So I personally do not think that dropping the intermediate step is a good idea at all...

Soda subscribed.

That user script page also says explicitly that "Some parts of this script make requests to third parties, which can leak your personal information, such as IP and user-agent, as well as the work/page you are looking at".
So I personally do not think that dropping the intermediate step is a good idea at all...

There was some discussion about this off-wiki and while this isn't implementable as currently presented, something like this where a user input's a URL into a external tool (on toolforge) which then generates a Index: page and provides a URL to a script that optionally load images from off-wiki sources is doable. It's just that ProofreadPage is probably not where this functionality is going to be hosted.

There is a tool called IA-upload, which works with IA style identifiers to upload to Commons, It is possible a tool like that could be extended to also generate a Wikisource Index page if the Commons upload was good?

Also The IA-upload tool should be extended to support a wider range of upload options, such as Full View Hathi-Trust and Google Books scans.
~~~~