Page MenuHomePhabricator

Server side upload request : Catalog of Copyright Entries Volumes from Internet Archive.
Closed, ResolvedPublic

Description

Owing to some long standing issues with uploading of large PDF's I am reluctantly having to make a request for the remaning volumes of the "Catalog of Copyright Entries" that are still to be uploaded at Commons, to be uploaded server-side as it does not seem to be feasible to upload them manually as I've encountered repeated errors in trying to do so, as has another contributors. I do not like having to request that these uploads will have to be done server-side, but if it impossible to do so reliably with the user-side tools...

The relevant category on Commons is https://commons.wikimedia.org/wiki/Category:Catalogs_of_Copyright_Entries
with the Internet Archive collection being at : https://archive.org/details/copyrightrecords?&sort=-date&page=9, with an RSS feed at https://archive.org/services/collection-rss.php?collection=copyrightrecords
(The complete collection has 674 items, although a number are already present on Commons.)

It was the color PDF versions that were being uploaded ( as opposed to the DVJU or Jpeg scans.)

Fae had uploaded a number of these already, but had noted some upload issues related to the uploads in T254459 and T255238. Fae also had some "brute-force" scripts, that were being used to try and get the collection "mirrored". The naming for uploads files in the category is indicated by the volumes already uploaded, as is the information provided on the file information pages for existing items already uploaded.

There was also a script on PAWS - https://paws-public.wmflabs.org/paws-public/User:AntiCompositeBot/CatCoprEntries.ipynb
which could be adapted to generate a list of the volumes from the IA collection that are still to be uploaded.

(It would also of course be greatly appreciated if the underlying issues with the client side upload of large files were resolved, so that there was less need to make server side upload request of this nature in the future.)

Event Timeline

Restricted Application added subscribers: Masumrezarock100, Aklapper. · View Herald Transcript

Fae may be able to advise on which volumes are still to be uploaded, as they have access to the logs of thier own scripts.

In order that is some standardisation in respect of naming and metadata , ( I hope the use of template style syntax for parameters is acceptable.). Is this detailed enough?

Naming/title of files:

File:{{{IA title field}}} (IA {{{IA identifer}}}).pdf

Sample markup for File Description page:

== {{int:filedesc}} ==
{{information
|description = {{en|1=
:'''Title''': {{{IA title field}}}
:'''Description''': The Catalogs of Copyright Entries (CCEs) are published compilations of copyright registration records cataloged from July 1891 through December 1977.  These entries alone may not reflect the complete Copyright Office record pertaining to a particular work.  Contact the U.S. Copyright Office for information about any additional records that may exist.  Although limited word searching of this volume is available, users are also advised to refer to the indexes included in each volume or in an accompanying volume when searching for a particular work.
:'''Topics''': {{{IA Topics field}}}
:'''Identity''': {{{IA identifer}}}
}}
|author = U.S. Govt. Print. Off.
|date = 1966
|source = 
: Gallery: https://archive.org/details/{{{IA identifer}}}
: File: https://archive.org/download/{{{IA identifer}}}/{{{IA identifer}}}.pdf
}}
== {{int:license-header}} ==
{{PD-USGov}}

[[Category:Catalogs of Copyright Entries|{{{year}}}.{{{Part number}}}-{{{group number}}}]]

The Group and part numbers for the Third series Volumes are The Part number and Group Number as they appear in the title pages of the relevant volumes. Additonal categorisation disambigs can be applied by adding months or No. information after the {{{group number}}} prefixed with a period (full stop).

Use:

[[Category:Catalogs of Copyright Entries|{{{year}}}.{{{quarter}}}]]

to categorise Original Series (to 1906) volumes instead.

Approximate number of remaining volumes is around 120 or so, mostly pre 1950.

Update: All the files in the relevant category should now be uploaded, so closing out this request, but if additional volumes or bad uploads come to light, feel free to re-open.

ShakespeareFan00 triaged this task as Lowest priority.