Page MenuHomePhabricator

Allow upload-by-URL from upload.wikimedia.org
Open, NormalPublic

Description

This might seem ridiculous at first glance, but it would be incredibly useful for writing Commons transfer scripts (similar in concept to CommonsHelper, but calling the API from JavaScript).

It may be as simple as adding upload.wikimedia.org to $wgCopyUploadsDomains in InitialiseSettings.php. However, I don't know if the server configuration will allow this to work straight away.

See also T22512.

Details

Reference
bz42473

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 1:01 AM
bzimport set Reference to bz42473.
bzimport added a subscriber: Unknown Object (MLST).
TTO created this task.Nov 27 2012, 11:12 AM
TTO added a comment.Nov 27 2012, 11:14 AM

See also my comment at bug 14919 comment 5

Adding dependency to tracking bug 37883 (Wikimedia Commons features).

Adding reedy as CC to get feedback on the server configuration issue.

See also bug 20512

[feature => severity "enhancement"]

TTO added a comment.Dec 7 2012, 6:48 AM

Adding Ryan; he's the man, apparently

One issue with this is that the proxy server currently handling upload-by-URL requests can't do HTTPS. So we would either need to fix that bug, or give some warning that HTTPS requests will error out.

Is there already a bug "add HTTPS capability to the proxy server"?

If so, please add a dependency.

Could we now enable this feature or is there another blocker?

Reedy added a comment.Jan 9 2013, 6:50 PM

(In reply to comment #9)

Could we now enable this feature or is there another blocker?

I guess it should be enabled on testwiki and confirmed to work first...

TTO added a comment.Feb 2 2013, 10:54 PM

Could someone please go ahead and enable this on testwiki?

Reedy added a comment.Feb 2 2013, 11:03 PM

(In reply to comment #11)

Could someone please go ahead and enable this on testwiki?

https://gerrit.wikimedia.org/r/47299

TTO added a comment.Feb 3 2013, 2:25 AM

Thanks, however it doesn't seem to work for me. I ran a test from test2wiki (this was easier because my JS code is set up for CORS):

HTTP POST to http://test.wikipedia.org/w/api.php

action=upload
filename=0.28589522187660577.png
text=this is a test file
comment=upload comment
token=<VALID EDIT TOKEN>
url=http%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Ftest2%2F5%2F53%2F0.28589522187660577.png
ignorewarnings=true
format=json
origin=http%3A%2F%2Ftest2.wikipedia.org

This is the response:

{"servedby":"srv193","error":{"code":"http-bad-status","info":"Error fetching file from remote source","0":"403","1":"Forbidden"}}

Reedy added a comment.Feb 3 2013, 2:36 AM

(In reply to comment #13)

{"servedby":"srv193","error":{"code":"http-bad-status","info":"Error fetching
file from remote source","0":"403","1":"Forbidden"}}

acl to-wikimedia dst 208.80.152.0/22
acl to-wikimedia dst 91.198.174.0/24
acl to-wikimedia dst 10.0.0.0/16
acl to-wikimedia dst 10.64.0.0/16

Do not allow any fetches from our own IP ranges

http_access deny to-wikimedia

I'm not sure if the answer is to make squid serve those requests, or add a list of sites that shouldn't use $wgCopyUploadProxy

Suspect that's a question for ops whether they're ok with letting the proxy read from the cluster..

faidon added a comment.Feb 3 2013, 9:31 AM

No, an upload-by-url proxy is the wrong day to do it. If we want to copy files within the upload.wm.org realm, then we should use efficient server-side copies (e.g. Swift's X-Copy-From header), not go through the application servers and upload-by-URL proxies.

Moreover, copying files internally seems wrong to me in general. It's probably okay if it's a limited use case, but if it's something that's going to get popular, then some other way of multiple reference to the same file should be found, rather than having the same contents copied over and over in the media storage backends.

TTO added a comment.Feb 3 2013, 10:26 AM

Maybe so. However, Commons transfer has always been done by a download-upload process (this is what CommonsHelper on toolserver does, for example). Fixing this bug would allow this tried-and-true approach to continue at a faster rate. Or, we could wait an indefinite amount of time for the file storage backend to be complexified, convoluted, etc...

TTO added a comment.May 5 2013, 7:59 AM

(In reply to comment #14)

Suspect that's a question for ops whether they're ok with letting the proxy
read from the cluster..

Were ops ever contacted about this?

(In reply to comment #17)

Were ops ever contacted about this?

See answer in comment 15 by Faidon.

TTO added a comment.May 7 2013, 10:04 AM

(In reply to comment #18)

See answer in comment 15 by Faidon.

My bad, I didn't realise Faidon was part of the ops team.

It seems we've reached a stalemate: ops is refusing to fulfil the request, but no alternative is being suggested.

(In reply to comment #15)

It's probably
okay if it's a limited use case, but if it's something that's going to get
popular

Just so you are aware, Faidon... I daresay hundreds of thousands of files have already been copied from WMF wikis to Commons, leading already to massive duplication on the servers. So this process is already rather popular, and this bug is a way to streamline the process.

To be clear, I would welcome an alternative internal approach, or a rationalisation of the file storage backend, but I don't see those things happening anytime soon. Going ahead and reconfiguring the proxy can be done now (as far as I can tell) and would make the process as it already exists a lot simpler.

[CC'ing Fabrice as this covers Uploading/Multimedia]

TTO added a comment.Mar 20 2014, 11:10 AM
  • Bug 62820 has been marked as a duplicate of this bug. ***

RfC is running at Commons: https://commons.wikimedia.org/wiki/Commons:Requests_for_comment/Allow_transferring_files_from_other_Wikimedia_Wikis_server_side

I didn't conceal that it's possibly not implemented *but* I hope that strong consensus and some of the comments by the community possibly motivate responsible persons to re-consider their position. The way transferring files is currently done adds likely more load the the WMF servers as if the proxies would allow to fetch from WMF directly.

Status update: On [[Commons:Commons:Requests for comment/Allow transferring files from other Wikimedia Wikis server side]], we have an unanimous consensus.

(In reply to Faidon Liambotis from comment #15)

No, an upload-by-url proxy is the wrong day to do it. If we want to copy
files within the upload.wm.org realm, then we should use efficient
server-side copies (e.g. Swift's X-Copy-From header), not go through the
application servers and upload-by-URL proxies.
Moreover, copying files internally seems wrong to me in general. It's
probably okay if it's a limited use case, but if it's something that's going
to get popular, then some other way of multiple reference to the same file
should be found, rather than having the same contents copied over and over
in the media storage backends.

Actually, we already do that with manual bots and tools to transfer media from local Wikimedia to Commons when they have been cleared as freely licensed or in public domain.

So I offer to enable it as it won't create more copy than we currently have, and then open a new bug to work on a better solution.

tomasz removed a project: Shell.Feb 23 2015, 8:01 PM
tomasz set Security to None.

So I offer to enable it as it won't create more copy than we currently have, and then open a new bug to work on a better solution.

That would be great, indeed. Can you enable that now?

FDMS added a subscriber: FDMS.Feb 26 2015, 1:12 PM
Steinsplitter moved this task from Incoming to Uploading on the Commons board.Mar 11 2015, 12:55 PM
Steinsplitter added a subscriber: Steinsplitter.EditedApr 10 2015, 2:34 PM

So I offer to enable it as it won't create more copy than we currently have, and then open a new bug to work on a better solution.

@Dereckson Just asking about the status :-). I read the discussion again and it looks like it is possible now to enable this. Or not? It needs some special config? There is also T78167. Thanks in advice.

It would need the acls in the squid config for url-downloader.wikimedia.org to be changed. Someone (@csteipp ?) Would probably need to asses the security risk of such a change.

Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptNov 15 2015, 8:35 AM
Meno25 removed a subscriber: Meno25.Feb 22 2016, 5:38 PM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptFeb 22 2016, 5:38 PM
Yann added a comment.Mar 19 2016, 11:02 PM

Stale for nearly a year. Any news about this?

It sounds like someone needs to create a new ticket out of T44473#1198327, assign it to Ops and Security, and add it as a blocker to this bug.

Dereckson updated the task description. (Show Details)Mar 23 2016, 2:55 AM
Restricted Application added a subscriber: Poyekhali. · View Herald TranscriptAug 21 2016, 1:04 PM

No, an upload-by-url proxy is the wrong day to do it. If we want to copy files within the upload.wm.org realm, then we should use efficient server-side copies (e.g. Swift's X-Copy-From header), not go through the application servers and upload-by-URL proxies.
Moreover, copying files internally seems wrong to me in general. It's probably okay if it's a limited use case, but if it's something that's going to get popular, then some other way of multiple reference to the same file should be found, rather than having the same contents copied over and over in the media storage backends.

So assuming that @faidon 's comment still stands. What is the way forward here?

How about having a config variable that gives a regex which converts urls to mwstore:// virtual urls. Thus if MW see's a url matching that regex, instead of doing an http request to copy the file, it will do an internal SWIFT copy.

Part of the problem is that the Upload class is very stiff and difficult to modify. However I think this is do-able.

Dzahn changed the status of subtask T142991: Enable "upload by url" feature at zhwiki from Open to Stalled.Dec 21 2016, 9:14 PM

So assuming that @faidon 's comment still stands. What is the way forward here?
How about having a config variable that gives a regex which converts urls to mwstore:// virtual urls. Thus if MW see's a url matching that regex, instead of doing an http request to copy the file, it will do an internal SWIFT copy.

@faidon: Any opinion on that approach?

see T140462 and T190716, this problem has been solved in another way?

Looking at the description of this ticket FileImporter can not be called from scripts right now, it is just a special page.
I don't think there is a ticket for this.

tomasz removed a subscriber: tomasz.Sat, Jun 29, 9:06 PM