Page MenuHomePhabricator

Server-side upload request for Geagea
Closed, DeclinedPublic

Description

Please upload the following file(s) to Wikimedia Commons:

I am asking to Mass upload as Pattypan is broken. 36,544 files to upload

URL the files - https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly (one week period)

I split the txt file to 4 files. 9,136 file description in each.

I tried to upload the files but get failure so upload them to google drive:

https://drive.google.com/file/d/1bUJwqQKBu1ZDt1_6uoByHuK3nzAK32YS/view?usp=sharing
https://drive.google.com/file/d/1-0ELGmU-w4zppNjS8VeacjwYoN-D3Ebl/view?usp=sharing
https://drive.google.com/file/d/1Ze4f9pnxepzqFtPft_Oiq9wAVCC0HlBY/view?usp=sharing
https://drive.google.com/file/d/1PayR2uYA1REPCDtdmc-ZF9kpP-Vw1elv/view?usp=sharing

My username is Geagea. Thank you.

Event Timeline

Please advise if the data is needed in any other format.

The descriptions need to be in individual files that match the filenames to be uploaded; not in multiple large files that contain many descriptiions each... Otherwise the uploader has to split them.

https://www.mediawiki.org/wiki/Manual:ImportImages.php

Server-side upload does not use phyton?

The descriptions need to be in individual files that match the filenames to be uploaded; not in multiple large files that contain many descriptiions each... Otherwise the uploader has to split them.

Where will the source URL be taken from? Is it at all possible to upload from URL?

The descriptions need to be in individual files that match the filenames to be uploaded; not in multiple large files that contain many descriptiions each... Otherwise the uploader has to split them.

Where will the source URL be taken from? Is it at all possible to upload from URL?

You could request the URL to be allowed for server side upload if you wanted.

I'm not sure what you mean by "source URL".

When these requests happen, the files (images/videos/whatever and their descriptions) are downloaded to one of the mediawiki "maintenance" servers, and then uploaded to commons from there

Server-side upload does not use phyton?

No...

https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly

גיאה

‫בתאריך יום א׳, 19 בדצמ׳ 2021 ב-12:56 מאת ‪Krd‬‏ <‪
no-reply@phabricator.wikimedia.org‬‏>:‬

Krd added a comment. View Task https://phabricator.wikimedia.org/T297783

In T297783#7579492 https://phabricator.wikimedia.org/T297783#7579492,
@Reedy https://phabricator.wikimedia.org/p/Reedy/ wrote:

The descriptions need to be in individual files that match the filenames
to be uploaded; not in multiple large files that contain many descriptiions
each... Otherwise the uploader has to split them.

Where will the source URL be taken from? Is it at all possible to upload
from URL?

*TASK DETAIL*
https://phabricator.wikimedia.org/T297783

*EMAIL PREFERENCES*
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

*To: *Geagea, Krd
*Cc: *Reedy, Krd, Geagea, Aklapper, STei-WMF, Zabe, Juan90264, CptViraj,
AKA_MBG, Dibya, 94rain, DannyS712, Tks4Fish, Jayprakash12345, Kizule,
Devwaker, Niklitov, Urbanecm, 4nn1l2, JEumerus, Ananthsubray,
Superzerocool, Stanglavine, Tulsi_Bhagat, Wong128hk, Luke081515, SimmeD,
Snowolf, Base, Dcljr, Matanya, Jay8g, Krenair

There files have different names at the source site than they should have at commons. They may also be in different path at the source site, or even at different sites. What is the procedure to get them to the mediawiki maintenance servers and apply the file name mapping?

There files have different names at the source site than they should have at commons. They may also be in different path at the source site, or even at different sites. What is the procedure to get them to the mediawiki maintenance servers and apply the file name mapping?

The script doesn't appear to support that so the the option is either rename on the source system prior to it being looked at, or they get uploaded as is and renamed once they uploaded on the wiki.

Please give me a day or two to prepare this accordingly.

I can rename them to the same name that should be uploaded to Commons but it will be in my computer. I can use the excel file to create lines for command line --> copy xyz.jpg c:\aaa\bbb\abc.jpg.

Please see http://gulu.net/T297783_v1.tar.bz2 with the included A.sh for the download. Does that make sense?

Please see http://gulu.net/T297783_v1.tar.bz2 with the included A.sh for the download. Does that make sense?

It does.

It seems CLI doesn't like the SSL cert too much..

Local:

% wget https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg
--2021-12-19 20:24:26--  https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg
Resolving ftp.nli.org.il (ftp.nli.org.il)... 192.114.7.21
Connecting to ftp.nli.org.il (ftp.nli.org.il)|192.114.7.21|:443... connected.
ERROR: cannot verify ftp.nli.org.il's certificate, issued by ‘CN=Go Daddy Secure Certificate Authority - G2,OU=http://certs.godaddy.com/repository/,O=GoDaddy.com\\, Inc.,L=Scottsdale,ST=Arizona,C=US’:
  Unable to locally verify the issuer's authority.
To connect to ftp.nli.org.il insecurely, use `--no-check-certificate'.

WMF Servers:

reedy@mwmaint1002:/tmp/uploads/T297783$ wget https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg
--2021-12-19 20:27:19--  https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg
Resolving webproxy.eqiad.wmnet (webproxy.eqiad.wmnet)... 2620:0:861:1:208:80:154:32, 208.80.154.32
Connecting to webproxy.eqiad.wmnet (webproxy.eqiad.wmnet)|2620:0:861:1:208:80:154:32|:8080... connected.
ERROR: The certificate of ‘ftp.nli.org.il’ is not trusted.
ERROR: The certificate of ‘ftp.nli.org.il’ doesn't have a known issuer.

FF on my macbook is fine with it though.

Obviously we can make it just ignore the certs using --no-check-certificate and download the files, but that doesn't always feel good :)

If we look at https://www.ssllabs.com/ssltest/analyze.html?d=ftp.nli.org.il I think we can see the problem

Screenshot 2021-12-19 at 20.31.40.png (1×2 px, 176 KB)

I'm not so fussed about it still supporting TLS 1.0/1.1, but the incomplete chain stuff isn't great to see.

If someone knows any of the server admins of ftp.nli.org.il might be worth giving them a heads up and getting them to fix it

Reedy changed the task status from Open to In Progress.Dec 19 2021, 8:37 PM
Reedy claimed this task.
Reedy triaged this task as Low priority.

what do you need from ftp.nli.org.il . I can contact with them.

what do you need from ftp.nli.org.il . I can contact with them.

They should fix their SSL config so their webserver isn't presenting an incomplete certificate chain :). They might want to fix some of the other issues hightlighted by https://www.ssllabs.com/ssltest/analyze.html?d=ftp.nli.org.il too

ok, already wrote to them. will answer in the morning I suppose.

So doing wget --no-check-certificate 'https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg' (for example) for example doesn't result in an image...

And browsing to it also doesn't result in an image...

Are these download links actually correct?

According to them this is a FTP protocol which is different from cloud services. Anyway he will upload the files to ONE DRIVE. will update when it be ready.

According to them this is a FTP protocol which is different from cloud services.

What does that even mean?

Anyway, this the link to one drive:

That does not seem to be a direct (!) download link but seems to require clicking things in some web browser.

Screenshot from 2021-12-21 10-09-52.png (132×1 px, 33 KB)

I just tried downloading two files, and they again came down as HTML files, not JPG.

reedy@mwmaint1002:/tmp/uploads/T297783$ wget --no-check-certificate  'https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg' && mv 'd-724.jpg' 'Geulah camp, Aden (997008136892705171).jpg'
--2021-12-21 13:19:52--  https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-724.jpg
Resolving webproxy.eqiad.wmnet (webproxy.eqiad.wmnet)... 2620:0:861:1:208:80:154:32, 208.80.154.32
Connecting to webproxy.eqiad.wmnet (webproxy.eqiad.wmnet)|2620:0:861:1:208:80:154:32|:8080... connected.
Proxy request sent, awaiting response... 200 Ok
Length: 4657 (4.5K) [text/html]
Saving to: ‘d-724.jpg’

d-724.jpg                                     100%[==============================================================================================>]   4.55K  --.-KB/s    in 0s      

2021-12-21 13:19:53 (51.6 MB/s) - ‘d-724.jpg’ saved [4657/4657]

reedy@mwmaint1002:/tmp/uploads/T297783$ wget --no-check-certificate  'https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-723.jpg' && mv 'd-723.jpg' 'Geulah camp, Aden (997008136892805171).jpg'
--2021-12-21 13:19:54--  https://ftp.nli.org.il/public/folder/7ysBpgJ9_U_EN_wdKIK_AQ/Wiki%20Orly/d-723.jpg
Resolving webproxy.eqiad.wmnet (webproxy.eqiad.wmnet)... 2620:0:861:1:208:80:154:32, 208.80.154.32
Connecting to webproxy.eqiad.wmnet (webproxy.eqiad.wmnet)|2620:0:861:1:208:80:154:32|:8080... connected.
Proxy request sent, awaiting response... 200 Ok
Length: 4657 (4.5K) [text/html]
Saving to: ‘d-723.jpg’

d-723.jpg                                     100%[==============================================================================================>]   4.55K  --.-KB/s    in 0s      

2021-12-21 13:19:55 (94.9 MB/s) - ‘d-723.jpg’ saved [4657/4657]

reedy@mwmaint1002:/tmp/uploads/T297783$ sha1sum *.jpg
3580625eb4a0af4b2c65c3aacf400720a6395337  Geulah camp, Aden (997008136892705171).jpg
3580625eb4a0af4b2c65c3aacf400720a6395337  Geulah camp, Aden (997008136892805171).jpg
reedy@mwmaint1002:/tmp/uploads/T297783$ cat "Geulah camp, Aden (997008136892705171).jpg"
<!DOCTYPE html>
<html lang="en">
	<head>
		<meta charset="utf-8" />
		<meta http-equiv="X-UA-Compatible" content="IE=edge" />
		<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0" />
        <meta name="mobile-web-app-capable" content="yes">
		<title>National Library of Israel Web Client</title>
				
		<link rel="shortcut icon" href="/favicon.ico" />
		<link rel="icon" sizes="196x196" href="/images/android-icon-196x196.png" />
        <link rel="apple-touch-icon" sizes="114x114" href="/images/apple-icon-114x114.png" />
        <link rel="apple-touch-icon" sizes="144x144" href="/images/apple-icon-144x144.png" /> 
               
		<!-- Bootstrap core CSS -->
		<link rel="stylesheet" href="/css/bootstrap.min.css" />
		<link rel="stylesheet" href="/css/bootstrap-datetimepicker.min.css" />
		
		<link rel="stylesheet" href="/custom/css/default-theme.css" />
        
		<link rel="stylesheet" href="/css/common-3.0.css" />
		
		
		<link rel="stylesheet" href="/css/login-3.0.css" />

		<!--[if gte IE 9]>
			<style type="text/css">
				.gradient { filter: none; }
			</style>
		<![endif]-->

		<!--[if lte IE 9]>
			<link rel="stylesheet" href="/css/common-3.0-ie8.css" />
		<![endif]-->	

		<!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
		<!--[if lt IE 9]>
			<script src="/js/html5shiv.min.js"></script>
			<script src="/js/respond.min.js"></script>
		<![endif]-->	
	
		<script src="/js/jquery-1.11.2.min.js"></script> 
		<script src="/js/bootstrap.min.js"></script> 
		<script src="/js/functions-3.0.js"></script>
		
		
		
		<!-- Javascript language file -->
		<script src="/js/lang-1/en-us.js"></script> 
		
	</head> 
	<body>
   
    <!-- Wrap all page content here -->
	<div id="wrap">
    
		<!-- Fixed navbar -->
		<div class="navbar navbar-default navbar-static-top">
            <div class="container-fluid">
              <div class="navbar-header">
                <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
					<span class="sr-only">Toggle navigation</span>
					<span class="icon-bar"></span>
					<span class="icon-bar"></span>
					<span class="icon-bar"></span>
                </button>
                <a class="navbar-brand" href="/"><img class="hidden-xs" src="/images/logo/logo1.png" width="230" height="70" alt="logo"><span class="visible-xs-inline">National Library of Israel</span></a>
              </div>
              <div class="collapse navbar-collapse">
				<div class="navbar-nav navbar-right">
                    <div class="navbar-text navbar-right" style="margin: 10px 10px 10px 5px">
                            <small>Not currently logged in
                            </small>
                    </div>
                    <div class="clearfix"></div>
                    <ul class="nav nav-tabs navbar-right" style="border-bottom: none">
                        <li class="nav_home_item active"><a href="/">Home</a></li>
                        <li class=""><a href="/account?action=display">Account</a></li>
                        
                    </ul>
                    <div class="clearfix"></div>
                </div>
				<div class="clearfix"></div>
              </div><!--/.nav-collapse -->
            </div>
		</div>

        
        <div class="page-header">
            <div class="container-fluid">
                <h1 class="page-title"><span class="glyphicon glyphicon-globe"></span>&nbsp;&nbsp;Public</h1>
            </div>
        </div>
        
		<!-- Begin page content -->
		<div class="container-fluid">
            <div id="status-container">
                
                <div id="status" class="alert alert-dismissable alert-success">
                    <button type="button" class="close" data-dismiss="alert" aria-hidden="true">&times;</button>
                
                
                <div><span class="glyphicon glyphicon-exclamation-sign"></span>&nbsp;Unknown file path</div>
                
                </div>
                
            </div>
	
	
			
	
	
			
		

		</div>
	</div>

	<div id="footer">
		<div class="container-fluid">
			<p class="text-muted credit">
				<a href="/">Home</a>
				&nbsp;<span class="sep_footer_nav">|</span><a href="/account?action=display">Account</a>
				
                &nbsp;<span class="sep_footer_nav">|</span><a href="#" id="_context_help_"><span class="glyphicon glyphicon-question-sign"></span> Help</a>
			</p>
		</div>
	</div>

	</body>
</html>

Anyway, this the link to one drive:

That does not seem to be a direct (!) download link but seems to require clicking things in some web browser.

Screenshot from 2021-12-21 10-09-52.png (132×1 px, 33 KB)

And this asks me to login too.

Reedy changed the task status from In Progress to Stalled.Dec 21 2021, 6:27 PM