Page MenuHomePhabricator

IA Upload: Permit duplicate IA identifier if of a different format
Open, Needs TriagePublic

Description

https://commons.wikimedia.org/wiki/File:Duke_University_Libraries_(IA_carysnewitinerar01cary).pdf has over-compressed scans.

For Wikisource purposes, a good quality scan is needed, and in places the PDF scan of this is NOT reliably intelligible

So I would like to use https://iw.toolforge.org/ia-upload to attempt a regeneration of the file using the JP2 scans (into either PDF or DJVU.

The tool however won't let me do this because a file with the relevant IA identifier already exists ( albiet as PDF).

The tool should allow me to regenerate the relevant scan, regardless of the presence of the existing PDF, (The warning whilst appreciated, is not helpful if it prevents me doing something that was a definite choice to resolve a specific technical issue.)

Event Timeline

Restricted Application added a project: Community-Tech. · View Herald TranscriptDec 6 2020, 9:51 AM
ShakespeareFan00 renamed this task from Presence of PDf file with given filename or IA identifer blocks attempts to regenerate file to Presence of PDf file with given filename or IA identifer blocks attempts to regenerate alternate file for the associated identifier..Dec 6 2020, 9:53 AM
ShakespeareFan00 updated the task description. (Show Details)
Reedy renamed this task from Presence of PDf file with given filename or IA identifer blocks attempts to regenerate alternate file for the associated identifier. to Presence of PDF file with given filename or IA identifer blocks attempts to regenerate alternate file for the associated identifier.Dec 6 2020, 3:25 PM
Shooke added a subscriber: Shooke.Dec 29 2020, 11:24 PM

I had not seen that you had reported the same problem, ( i reported https://phabricator.wikimedia.org/T270928) Results that the user Fæ (BOT?) was loaded pdf files from Internet Archive https://commons.wikimedia.org/wiki/Special:Contributions/F%C3%A6 and so blocks djvu upload with this tool, ignoring that pdf version are poor version that djvu files. Djvu files are priority for Wikisource for OCR amd transcripts.

Samwilson added a subscriber: Inductiveload.
Samwilson added a subscriber: Samwilson.

I've merged the two above tasks to this one (the first that was created). They're not all identical, but I think can be fixed together by changing to just show a prominent warning when an IA identifier is found to already be on Commons, instead of prohibiting upload. This would mean additional PDFs or DjVus could be uploaded. Does that sound okay?

Samwilson removed a project: Wikisource.

As a precursor to this, it's probably worth updating some dependencies: https://github.com/wikisource/ia-upload/pull/48

Samwilson renamed this task from Presence of PDF file with given filename or IA identifer blocks attempts to regenerate alternate file for the associated identifier to IA Upload: Permit duplicate IA identifier if of a different format.Mon, Feb 22, 11:44 PM