Page MenuHomePhabricator

Allow GWT uploads from
Closed, InvalidPublic


please add the following domain(s) to the wgCopyUploadsDomains whitelist:

This is to support uploads from the Rijksmuseum, for example:
is the image that Rijksmuseum returns via their API for artefact at:

Version: wmf-deployment
Severity: enhancement



Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz64907.
bzimport added a subscriber: Unknown Object (MLST).
Fae created this task.May 5 2014, 5:58 PM
Fae added a comment.May 5 2014, 6:03 PM

Note that from a small test set, I see lh3, lh4, lh5, lh6 all being subdomains in use by the RM for images hosted at However I feel that a more complex regex restriction is likely to be unnecessary.

tomasz added a comment.May 5 2014, 6:33 PM

Apparently is a domain "Google uses to host data for YouTube", according to the internets; I'm unsure whether whitelisting the whole domain is a good idea.

Fae added a comment.May 5 2014, 7:03 PM

Perhaps we can whitelist "lh\d*" as more limited regex?

(In reply to Fæ from comment #3)

Perhaps we can whitelist "lh\d*" as more limited regex?

I don't think that would a significant difference. (vs. a wildcard)

I don't know commons policy well but I guess flickr is ok because there are bots to check what the licensing is at flickr and record the value in an edit to file desc page. (and even then we still have to worry sometimes about flickrwashing)

(In reply to Tomasz W. Kozlowski from comment #2)

Apparently is a domain "Google uses to host data for YouTube",

Seems to be more widespread. e.g. including Picasa pix

This would allow essentially the same range of content/uploaders as Google Drive unless we had a bot somehow checking for license metadata associated with a given URL (like we do with flickr)?

Fae added a comment.May 5 2014, 11:32 PM

Apart from a more complex regex, like the "lh\d" or maybe "lh[1-9]" domain limitation, I am unsure what else to recommend.

I welcome other eyes on the example at This shows an artefact image which is broken into tiles, each tile appears hosted at The API call I get my data from for the same artefact is (blanked out my API key), this gives some interesting values, including a link to the full image:

If there is a way of adding some suitable verification to the image page, that we might make requirement of using this tricky Google domain, I would be happy to look into it.

There is an alternative of using the images available at Europeana, however this limits us to whatever subset Europeana happen to be hosting (it is not simply a mirror), and in truth adds no value as the images for the Rijksmuseum were actually taken from the same source I am attempting to enable for the GWT to read for itself.

Fae added a comment.May 11 2014, 2:08 AM

Some more research has led me to an alternative (which was not in the least bit obvious from their API).

In the previous example of artefact "BK-1968-212", I can upload from and not have to rely on the hosted version at Google.

I presume that the RM are using a Google mirror when serving images to end users to reduce their server traffic. Unfortunately even their API does not provide the "internal" link as an alternative source, it has to be deduced and does not appear in the public facing documentation.

I am marking this request as resolved as I can apply this work-around.