Page MenuHomePhabricator

Add images.collections.yale.edu to the wgCopyUploadsDomains allowlist of Wikimedia Commons
Closed, ResolvedPublic

Description

The images from various collections from yale university and affiliated musea are shared with CC0. Could this domain be added to the list of URLs from which direct uploads are allowed?

Here is an example of a cc0 image: https://images.collections.yale.edu/iiif/2/ypm:63b47ceb-6a7a-433e-9daf-dc0263914842/full/!1920,1920/0/default.jpg. The metadata including the license information can be found here: https://www.gbif.org/occurrence/1039196479

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

4nn1l2 triaged this task as Medium priority.Feb 3 2022, 7:51 AM
4nn1l2 subscribed.

I may be able to do this, but I need more examples, as it seems to me that some subdomain of amazonaws.com (such as prd-cds2-image-store-ypm.s3.amazonaws.com) should be allowlisted, rather than images.collections.yale.edu.

Can both be allowlisted? I am asking because your observation is accurate, the URL images.collections.yale.edu are indeed forwarded to prd-cds2-image-store-ypm.s3.amazonaws.com. The issue I am facing is that in the source the image.collections.yale.edu is used, while it is forwarded to an amazon subdomain. So far, to me, it seems impossible to capture the redirect header information in javascript. If it is not possible, allowlisting the amazon subdomain would already be helpful. I can make a separate script that resolves all the primary URLs while I try to find a solution to fetch redirect headers in the upload workflow.

Other examples:

Change 759557 had a related patch set uploaded (by 4nn1l2; author: 4nn1l2):

[operations/mediawiki-config@master] commonswiki: Add three domains to the wgCopyUploadsDomains allowlist

https://gerrit.wikimedia.org/r/759557

Change 759557 merged by jenkins-bot:

[operations/mediawiki-config@master] commonswiki: Add three domains to the wgCopyUploadsDomains allowlist

https://gerrit.wikimedia.org/r/759557

Change 759569 had a related patch set uploaded (by 4nn1l2; author: 4nn1l2):

[operations/mediawiki-config@master] commonswiki: Remove images.collections.yale.edu from the wgCopyUploadsDomains allowlist

https://gerrit.wikimedia.org/r/759569

Change 759569 merged by jenkins-bot:

[operations/mediawiki-config@master] commonswiki: Remove images.collections.yale.edu from the wgCopyUploadsDomains allowlist

https://gerrit.wikimedia.org/r/759569

Mentioned in SAL (#wikimedia-operations) [2022-02-03T19:26:23Z] <taavi@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:759557|commonswiki: Add three domains to the wgCopyUploadsDomains allowlist (T299835 T300848)]] (duration: 00m 54s)

Can both be allowlisted?

As you can see at https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/759557/, we allowlisted both of them, but 'images.collections.yale.edu' did not work and we had to remove it in another patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/759569/.