Page MenuHomePhabricator

Upload photos from Elsinga collection at the Alkmaar archives (+/- 10,000 images)
Closed, ResolvedPublic

Assigned To
Authored By
Ecritures
Oct 20 2019, 10:40 PM
Referenced Files
F34458657: Regionaal Archief Alkmaar template Pattypan .txt
May 18 2021, 8:23 AM
F31314941: afbeelding.png
Nov 27 2019, 2:12 PM
F31202332: results-00000.xls
Nov 24 2019, 7:58 AM
F31133346: parse.py
Nov 22 2019, 5:54 PM
F31133137: download_T235995.sh
Nov 22 2019, 2:05 PM
Tokens
"Yellow Medal" token, awarded by Ecritures.

Description

CC-0 license for this Elsinga collection is mentioned in the disclaimer in their website: https://www.regionaalarchiefalkmaar.nl/disclaimer?fbclid=IwAR248LwdG9Ecq3micqEqcJwJj3i4AlzmsVVR0b6Plur5tpC4CUu1EKvhNq4

Please use this URL in the source field of your Pattypan upload to refer to the CC0 license.
You can use this banner {{Elsinga Collection}} to add to the source field of uploads.

Example record from the set: https://commons.wikimedia.org/wiki/File:Gezicht_op_Grote-_of_Sint_Laurenskerk,_Alkmaar,_Regionaal_Archief_Alkmaar_RAA011002951.jpg

Event Timeline

Ecritures triaged this task as Medium priority.Oct 20 2019, 10:40 PM
Ecritures renamed this task from Upload photos from collection Alkmaar archives to Upload photos from Elsinga collection at the Alkmaar archives (+/- 10.000 images).Oct 21 2019, 9:37 AM
Ecritures updated the task description. (Show Details)
Ecritures removed a subscriber: Aklapper.

Het uploaden van de hele bups is sowieso geen taak voor dit moment, dus hoort niet op mijn bordje te liggen.
Volgens mij zou Ecritures voor het downloaden zorgen. Leuk dat Ecritures dit opeens aan mij assigned, maar dat is niet hoe het werkt.

Er staat nu een setje van 200 records op https://maior.memorix.nl/api/oai/raa/key/Elsinga/?verb=ListRecords&metadataPrefix=ese . Dat kan ik in vieren hakken, maar komt niet in de buurt van de 10.000 afbeeldingen.

Overigens zijn deze afbeeldingen uit de tachtiger en negentiger jaren van de twintigste eeuw, Zijn deze vrij beschikbaar? Ik zet daar mijn vraagtekens bij.

@RonnieV, the CC0 license is stated in their disclaimer on the website. I stated this URL in the main task (=this one) so the URL can be added/used during the Pattypan upload.

@RonnieV The amount of 500 images for the batch uploads for the practical sessions seem a reasonable amount. It would indeed be nice if you added the batches of 500 images + their metadata to each of the subtask of this parents task. There is another subtask (one you created yourself if I am correct) for smaller batches (50?) to be created for use in a Commons workshop.

@Ecritures , the file stated above which you call 'metadata' (it's just the data belonging to the specific picture), only contains 200 pictures, not the 10.000+ you said it would contain.
I can make four sets of 50 items each, but that is it. Please give me a file containing all 10k+ records.

@RonnieV : the resumptionToken is set at 200. If you issue subsequent requests by using the resumptionToken value then you will get the next batch. This API does in fact contain all the records (It states completeListSize="12089")
And you are correct: I use the word metadata for the data that are 'attached to' the picture and that provide the info on the creator, source, licence, id number etc)

Reedy renamed this task from Upload photos from Elsinga collection at the Alkmaar archives (+/- 10.000 images) to Upload photos from Elsinga collection at the Alkmaar archives (+/- 10,000 images).Nov 22 2019, 9:59 AM

Working on this. First step is to download the data. Attached is a script to scrape a generic OAI-PMH endpoint and save every batch in a uniquely named file. The basis of the script comes from https://wiki.lyrasis.org/display/DSPACE/OAI+XML+cache+warmup by Ivan Masar, with improvements by me.

Script to convert the data downloaded by download_T235995.sh to CSV files with 500 items each, including some cleanup of descriptions.

Update: We experienced a problem uploading with Pattypan, uploadproces wont start/is stuck a 1/500. I have tried to upload from home with pattypan (with a whitelisted user account), and still same result. I will try again from home one more time this afternoon, and else ask Yarl. Thanks all! To be continued!
Example .csv:

@Yarl: see above, could you please help out? We have been trying to upload batches of 500 records from CSV with a URL for the image instead of a local file, but the upload won't start (see screenshot, in Dutch). I have been using my GWT-whitelisted account (User:SIryn, see also this list). The domain is also whitelisted, @Reedy checked this.

Can you help us figure out what the problem is?

Thanks so much for helping out!

afbeelding.png (718×1 px, 57 KB)

This open task is tagged with Wiki-Techstorm-2019 which took place a while ago. Tasks shouldn't be lingering - Please either set the status of this task to resolved/declined, or add an active project tag so this task can be found when looking at an active (not: 2019) project - thanks a lot!

No reply to previous comment; declining as this task is related to an event that took place in 2019.