Page MenuHomePhabricator

Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons
Closed, ResolvedPublic

Description

Please add the following domain to the wgCopyUploadsDomains allowlist, so that I can upload media files from that domain. I have provided at least 3 example URLs to media files.

artsdatabanken.no

https://artsdatabanken.no/Files/20166
https://artsdatabanken.no/Files/20148/Henrichia_Pertusa_gruppen_03_Espen_Rekdal.jpg
https://www.artsdatabanken.no/Files/21799/Caprella_mutica_24-01-10_1.JPG
https://artsdatabanken.no/Files/18278/Pteraster_militaris_02_Espen_Rekdal.jpg

The medias are free licensed as you can see:
https://artsdatabanken.no/Pages/F18278
https://www.artsdatabanken.no/Pages/F21799

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Change 640813 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons

https://gerrit.wikimedia.org/r/640813

Urbanecm moved this task from Backlog to To deploy on the Wikimedia-Site-requests board.
Urbanecm added a subscriber: Urbanecm.
Urbanecm triaged this task as Medium priority.Nov 12 2020, 2:01 PM
Urbanecm moved this task from Backlog to To deploy on the User-Urbanecm board.

Change 640813 merged by jenkins-bot:
[operations/mediawiki-config@master] Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons

https://gerrit.wikimedia.org/r/640813

Mentioned in SAL (#wikimedia-operations) [2020-11-12T19:08:15Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 3ce18e6f63abe060c05c40239b651086f65a1a33: Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T267784) (duration: 01m 00s)

@Urbanecm , I managed to make a first batch, including this file: https://commons.wikimedia.org/wiki/File:Marthasterias_glacialis_03_Espen_Rekdal.jpg
however at my first attempt I had the same kind of message as the following message:
"The media file URL could not be evaluated. The URL delivers the content in a way that is not yet handled by this extension or there was an HTTP request issue. URL given was "https://www.artsdatabanken.no/Files/21799/Caprella_mutica_24-01-10_1.JPG". HTTP request error "There was a problem during the HTTP request: 404 Not Found".

But I managed to do the uploads by replacing "https" by "http" wihtin the URLs of the images, however now I've another issue, if I use "https" I still have the message above, and if replace "https" by "http" now I have the following message:
"Copy uploads are not available from this domain."

If that can helpd, I give you an exemple of the XML file I use, here with 3 records:
<?xml version="1.0" encoding="UTF-8"?>

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">

<record> <articleCitation> Erling Svensen / Ocean Photo </articleCitation> <license> https://creativecommons.org/licenses/by/4.0/ </license> <source> https://www.artsdatabanken.no/Pages/F21799 </source> <description> Caprella mutica Schurin, 1935 </description> <url> https://www.artsdatabanken.no/Files/21799/Caprella_mutica_24-01-10_1.JPG </url> <title> Caprella_mutica_24-01-10_1 </title> <taxon01> Caprella mutica </taxon01> </record>
<record> <articleCitation> Erling Svensen / Ocean Photo </articleCitation> <license> https://creativecommons.org/licenses/by/4.0/ </license> <source> https://www.artsdatabanken.no/Pages/F21800 </source> <description> Caprella mutica Schurin, 1935 </description> <url> https://www.artsdatabanken.no/Files/21800/Caprella_mutica_24-01-10_2.JPG </url> <title> Caprella_mutica_24-01-10_2 </title> <taxon01> Caprella mutica </taxon01> </record>
<record> <articleCitation> Matz Berggren / Göteborgs Universitet </articleCitation> <license> https://creativecommons.org/licenses/by-sa/4.0/ </license> <source> https://www.artsdatabanken.no/Pages/F33703 </source> <description> Caprella mutica Schurin, 1935 </description> <url> https://www.artsdatabanken.no/Files/33703/caprella-mutica_mf.jpg </url> <title> caprella-mutica_mf </title> <taxon01> Caprella mutica </taxon01> </record>

</metadata>

I'm not either able to upload manually a single file such as https://www.artsdatabanken.no/Files/21907/Tubulanus_annulatus_14-02-10_1.JPG
using the tool
https://commons.wikimedia.org/w/index.php?title=Special:Upload&uploadformstyle=basic
I got the same message "Copy uploads are not available from this domain"

That's because I whitelisted only artsdatabanken.no, not also www.artsdatabanken.no. I'll fix that in a minute.

Change 644826 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Add all subdomains of artsdatabanken.no to the wgCopyUploadsDomains allowlist for commonswiki

https://gerrit.wikimedia.org/r/644826

Okay, not in a minute, but hopefully today :D. As I said previously, the workaround is to remove the www. part. If it's http or https shouldn't matter, but I would recommend to use https as it's encrypted.

Change 644826 merged by jenkins-bot:
[operations/mediawiki-config@master] Add all subdomains of artsdatabanken.no to the wgCopyUploadsDomains allowlist for commonswiki

https://gerrit.wikimedia.org/r/644826

Mentioned in SAL (#wikimedia-operations) [2020-12-02T17:58:03Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 24da542256f7c4cc955365ccd9739354f7162cc5: Add all subdomains of artsdatabanken.no to the wgCopyUploadsDomains allowlist for commonswiki (T267784) (duration: 01m 06s)

Okay, I just fixed it. Should work now!

@Urbanecm, I still have the message:
The media file URL could not be evaluated. The URL delivers the content in a way that is not yet handled by this extension or there was an HTTP request issue. URL given was "https://www.artsdatabanken.no/Files/21799/Caprella_mutica_24-01-10_1.JPG". HTTP request error "There was a problem during the HTTP request: 404 Not Found".

I have no idea.

I managed to upload that image with no issues. Can you please show me how you tried to upload the image?

@Urbanecm I tried with GWToolset with a XML file exactly similar at the exemple above, excepted that in the exemple above there is only the three first records instead of a bit more than one hundred.

Thanks. That's quite interesting case. I tried it myself at commons beta, with the same result. I then copied the URL to my browser, and it worked correctly. Then, I tried to download it with wget, which was also successful. However, trying to send a HEAD request indeed returns 404 not found, see:

urbanecm@LAPTOP-A3BHKQ07  ~
$ curl -I https://www.artsdatabanken.no/Files/21799/Caprella_mutica_24-01-10_1.JPG
HTTP/1.1 404 Not Found
Content-Length: 1245
Content-Type: text/html
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Access-Control-Allow-Origin: *
Date: Thu, 03 Dec 2020 17:20:40 GMT
Strict-Transport-Security: max-age=16070400

urbanecm@LAPTOP-A3BHKQ07  ~
$

It seems that their webserver is badly configured, and responds to some or all HEAD requests with 404 not found. I'd suggest to contact artsdatabanken.no directly.

Ok thanks for the investigation, I will see what I can do. Thanks again.

Just for the record, note that it is strange that I succeeded to upload a signifiant number of files (the files available in the sections 18 and 20 November 2020 of https://commons.wikimedia.org/wiki/User:FredD/Echinoderm_news/2020_November_16-30) when I replaced "https" by "http".

No need to investigate more it's just to leave a trace that I write the comment.