Page MenuHomePhabricator

Store data concerning disrepancies between PDF/DJVu on commons , and same work's hi res scans on an external source.
Open, LowPublic

Description

Recently a script was written that attempts matches up Page: 's for a given Index:/File: with the original or hi-res scans available for that work at an external site, such as Internet Archive or Hathi Trust

However, in a small number of instances, the file/scans at an external source and those on Commons, have different layouts, as files on Commons (or Wikisource) have been patched locally to account for absent pages, or pages where pages-scans were too unreliable for accurate transcription/proofreading.

This means that the script used at Wikisource, can sometimes retrieve what is perceived to be the 'wrong' hi-res scan compared to a transcription for a given Page:

Ideally, information concerning these discrepancies of layout, should be recorded somehow, so that Page:s and scans can potentially be "re-matched" based on knowing where the Commons file's layout (and that of the Wikisource Index) and that of an external source differ, on a per file basis.

I had an example schema in a comment here :- https://en.wikisource.org/w/index.php?title=Index%3ARuffhead_-_The_Statutes_at_Large_-_vol_8.djvu&type=revision&diff=10959161&oldid=10614535

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@ShakespeareFan00: Hi, just checking, did Inductiveload agree that this task should be on their workboard? :)

This means that the script used at Wikisource

Which script? URL welcome.

ShakespeareFan00 changed the task status from Open to Stalled.Mar 5 2021, 9:52 AM

The two scripts concerned. -
https://en.wikisource.org/wiki/User:Inductiveload/Jump_to_file.js
https://en.wikisource.org/wiki/User:Inductiveload/page carousel.js

I'm also closing this ticket, because it relates to a script on a specific wiki, and in subsequent discussions with the maintainer of the workboard seemingly stated that the functionality requested is unlikely to be implemented in the medium term.

Inductiveload claimed this task.
Inductiveload triaged this task as Low priority.

@Aklapper this is OK in this instance, thank you, though :-)

@ShakespeareFan00 I'm re-opening as low priority, because the request is valid, but T276042 is the first goal.