Please upload 16 large TIFFs
URL: http://www.ub.unibas.ch/digi/wikicommons/out/gt1gb2load.tar (46 GB)
Username: Basel University Library
Thank you very much
Andreas Bigger on behalf of Basel University Library
Please upload 16 large TIFFs
URL: http://www.ub.unibas.ch/digi/wikicommons/out/gt1gb2load.tar (46 GB)
Username: Basel University Library
Thank you very much
Andreas Bigger on behalf of Basel University Library
Sorry, my mistake. This URL is restricted to the IP of the GLAM WikiToolset.
Use instead: http://www.ub.unibas.ch/digi/a100/diverse_projekte/gt1gb2load.tar
Unfortunately Wikimedia's internal network proxy responds with 403 when I request this URL... Will try to find out why
The file is too big for the proxy to handle. It doesn't handle files 1GB or larger. But I shouldn't have to proxy this file via my own laptop to get it onto a mediawiki host...
However, bast1001 allows external downloads without needing to go via url-downloader (I checked bast2001 and hooft too but they're both tiny), so:
For this archive, we could get ops to set up rsyncd on terbium (generally the host used for server-side uploads, but too small for this whole archive at once), download the file to bast1001, extract the files and rsync what we can over to terbium, upload, delete uploaded files from terbium, rsync the next batch and repeat until all are done.
Alternatively, we could get ops to set up rsyncd to tin (not generally used for server-side uploads, so probably temporarily only), download the file to bast1001, rsync to tin, extract and upload each file.
(and then remove the extra files so we're not taking up disk space on bast1001/tin/terbium/wherever indefinitely)
We could ask @Basel_University_Library to split the archive up, but that wouldn't really solve any issues, just remove the extract step.
Or can we download it on labs, split it into parts, make the parts accessible over the web, download them on tin and piece it all back together and extract?
trying to break this down:
issue:
status:
problem 1 - disk space
problem 2 - http proxy issue
Hmm, I downloaded a 29 gigs OSM dump not so long ago without problems with curl -O -x webproxy.eqiad.wmnet:8080 <url>
... and the difference is url-downloader was used as proxy and is maximum_object_size 1010 MB (squid config on chromium). while webproxy.eqiad is on carbon and does not have that same limit.
Downloaded and extracted. This one is too big I'm afraid:
-rw-r--r-- 1 krenair wikidev 16G Sep 9 11:26 UBBasel_Map_1568_Kartenslg_AA_26-48.tif -rw-r--r-- 1 krenair wikidev 1.6K Sep 9 11:12 UBBasel_Map_1568_Kartenslg_AA_26-48.tif.txt
I think the rest should be fine.
Oh, sorry, I misremembered the limit - it's 4GB rather than 5GB. That also rules out this one:
-rw-r--r-- 1 krenair wikidev 4.5G Sep 9 11:18 UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif -rw-r--r-- 1 krenair wikidev 2.2K Sep 9 11:12 UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif.txt``
/bin/bash: line 1: 20109 Killed '/usr/bin/tiffinfo' '/home/krenair/upload-T111941/out/try2/UBBasel_Map_1700-1799_VB_A2-2-120a.tif' 2>&1
Resulted in a 0x0 but 2.02GB file. Retried with no luck. Any ideas @aaron?
These are done:
-rw-r--r-- 1 krenair wikidev 1.1G Sep 9 11:27 UBBasel_Map_1556_Kartenslg_AA_86-89.tif -rw-r--r-- 1 krenair wikidev 3.2G Sep 9 11:29 UBBasel_Map_1556_Kartenslg_Schw_A_1a.tif -rw-r--r-- 1 krenair wikidev 1.4G Sep 9 11:26 UBBasel_Map_1564_Kartenslg_AA_110-113.tif -rw-r--r-- 1 krenair wikidev 1.4G Sep 9 11:20 UBBasel_Map_1564_Kartenslg_AA_6-7.tif -rw-r--r-- 1 krenair wikidev 1.4G Sep 9 11:26 UBBasel_Map_1567_Kartenslg_AA_98-99.tif -rw-r--r-- 1 krenair wikidev 1.4G Sep 9 11:28 UBBasel_Map_1568_Kartenslg_Schw_Ca_1.tif -rw-r--r-- 1 krenair wikidev 1.6G Sep 9 11:21 UBBasel_Map_1569_Kartenslg_AA_3-5.tif -rw-r--r-- 1 krenair wikidev 1.4G Sep 9 11:27 UBBasel_Map_1572_Kartenslg_AA_119-120.tif -rw-r--r-- 1 krenair wikidev 2.3G Sep 9 11:21 UBBasel_Map_1572_Kartenslg_AA_8-10.tif -rw-r--r-- 1 krenair wikidev 1.2G Sep 9 11:20 UBBasel_Map_18uu-1615_Kartenslg_Schw_Ml_4e.tif -rw-r--r-- 1 krenair wikidev 2.3G Sep 9 11:28 UBBasel_Map_Bayern_Niederbayern_Oberbayern_1579_Kartenslg_Mappe_246-76.tif -rw-r--r-- 1 krenair wikidev 3.0G Sep 9 11:18 UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_3.tif -rw-r--r-- 1 krenair wikidev 2.8G Sep 9 11:19 UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_4.tif
(files were moved around a bit)
krenair@tin:~$ time tiffinfo upload-T111941/broken/UBBasel_Map_1700-1799_VB_A2-2-120a.tif >/dev/null TIFFReadDirectory: Warning, upload-T111941/broken/UBBasel_Map_1700-1799_VB_A2-2-120a.tif: wrong data type 7 for "RichTIFFIPTC"; tag ignored. TIFFReadDirectory: Warning, upload-T111941/broken/UBBasel_Map_1700-1799_VB_A2-2-120a.tif: unknown field with tag 37724 (0x935c) encountered. real 1m59.473s user 1m57.899s sys 0m1.368s krenair@tin:~$
Try running it through
vips tiffsave UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif out.tif --compression deflate
That should probably get it below the 4gb limit without any loss of picture detail (Some exif-like metadata might be stripped)
Figured out a way to get the file to an imagescaler (so vips is installed). That command actually gets it below the 1GB limit at which we would run server-side uploads. I'm not entirely convinced though:
krenair@mw2086:~$ ls -alh *.tif -rw-rw-r-- 1 krenair wikidev 597M Sep 14 13:30 out.tif -rw-rw-r-- 1 krenair wikidev 4.5G Sep 9 11:18 UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif
4.5GB -> 0.6GB? really?
And:
-rw-r--r-- 1 mwdeploy mwdeploy 16G Sep 14 13:52 UBBasel_Map_1568_Kartenslg_AA_26-48.tif -rw-rw-r-- 1 mwdeploy mwdeploy 239M Sep 14 13:56 compressed_UBBasel_Map_1568_Kartenslg_AA_26-48.tif
:|
Sorry for all the inconvenience caused ...
UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif ist a multitiff. vips seems to keep the first page, throwing the rest away ... Not really a solution.
I propose, you ignore this file and the UBBasel_Map_1568_Kartenslg_AA_26-48.tif. I will think of a better solution to get them in.
I will also check, what seems to be wrong with UBBasel_Map_1700-1799_VB_A2-2-120a.tif.
I recreated the UBBasel_Map_1700-1799_VB_A2-2-120a.tif
New URL: http://www.ub.unibas.ch/digi/a100/diverse_projekte/UBBasel_Map_1700-1799_VB_A2-2-120a.tif
(it comes now without the warnings, is a bit smaller, but seems to be OK)
I found a different command that should work on multipage tiffs
tiffcp -c zip:p9 infile.tif outfile.tif
I believe tiffcp is part of libtiff.
@Bawolff - yes we used tiffcp to create the multitiffs in the first place. I will try, what it can do for UBBasel_Map_1568_Kartenslg_AA_26-48.tif.
For UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif, I have also prepared a smaller version (thrown out two pages that were not really that important)
New URL: http://www.ub.unibas.ch/digi/a100/diverse_projekte/UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif (3 GB now)
krenair@tin:~/upload-T111941/working$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --overwrite --user=Basel_University_Library . Import Images UBBasel_Map_1700-1799_VB_A2-2-120a.tif exists, overwriting...done. Found: 1 Overwritten: 1 krenair@tin:~/upload-T111941/working$
krenair@tin:~/upload-T111941/working$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Basel_University_Library . Import Images Importing UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif...done. Found: 1 Added: 1
Anything else to do? I think the only thing missing is the 16GB file UBBasel_Map_1568_Kartenslg_AA_26-48.tif, but it sounds like that's not going to be possible.
Nope for UBBasel_Map_1568_Kartenslg_AA_26-48.tif - it's definitely too big, sorry.
So it's done. Great job! Thanks to all!
btw, be advised that due to quirks in how we render tiff thumbnails, the limit on large files where we don't display thumbnails is much higher for the first page then the other page, so we might not display later pages on some of your really big files.
btw, be advised that due to quirks in how we render tiff thumbnails, the limit on large files where we don't display thumbnails is much higher for the first page then the other page, so we might not display later pages on some of your really big files.
Thanks for the information.