Page MenuHomePhabricator

Enabling CORS for raw file URLs
Closed, DuplicatePublic

Description

In the Web2Cit project, we host some JSON schema files in the Gitlab repository.

We would like these files to be accessible from online JSON editors, to make it easier for our users to edit JSON files following these schemas.

However, that doesn't seem possible because the Gitlab server is not sending a Access-Control-Allow-Origin: * in the response.

This has been requested upstream, but it hasn't been resolved yet. In the meantime, using external services has been proposed as a workaround, such as https://raw.githack.com/, https://statically.io/. However, these do not work for the Wikimedia instance of Gitlab.

Would it make sense implementing this in Wikimedia's Gitlab? Or would there be a workaround, other than deploying these files to a separate domain using Gitlab CI?

Event Timeline

Just adding that as a workaround I configured mirroring the repository to Github and I'm using Github raw file URLs for now, which do support CORS. For example: https://raw.githubusercontent.com/web2cit/w2c-core/main/schema/templates.schema.json

sbassett subscribed.

Tagging Release-Engineering-Team for review. I think there's maybe two security questions here:

  1. Is Wikimedia's Gitlab installation designed to function as a CDN like github's? Or is that a potential Vuln-DoS? That's a question for Release Engineering, as they are the ostensible maintainers of that system.
  2. Is there any way to mitigate the possibility of nefarious or erroneous json config files being introduced into this repository? That really ends up being a question of what safety profile maintainers of a repository like this are comfortable with. I assume some level of permissions, code-review and automated tests are established for this repository, which should prevent most scenarios concerning malicious versions of a hosted config file being accidentally or intentionally introduced.
brennen subscribed.

Is Wikimedia's Gitlab installation designed to function as a CDN like github's? Or is that a potential Vuln-DoS? That's a questions Release Engineering, as they are the ostensible maintainers of that system.

Not consciously, really. That's not to say that it would necessarily be a bad idea, but I don't really think we've factored it in to the current buildout and it's not in the scope of our current focus on just replacing Gerrit & CI.

I'll confess that I'm uncertain whether there are dangerous implications to the Access-Control-Allow-Origin header here.

I wonder if it might be better to just define a standard way of publishing static files to something actually intended as a CDN for this use case. Possibly relevant: T303546: Gitlab CI should be able to publish static html docs.

I'll confess that I'm uncertain whether there are dangerous implications to the Access-Control-Allow-Origin header here.

If we're talking read-only stuff with content-type: text/plain (I believe this is the case for all "raw" files in Gitlab), I don't think there should be any issues with setting Access-Control-Allow-Origin: *. Given the upstream link within the task description, that still does not appear to be a feature within Gitlab? I'd guess this might be possible though, with some manual config for nginx?

I wonder if it might be better to just define a standard way of publishing static files to something actually intended as a CDN for this use case. Possibly relevant: T303546: Gitlab CI should be able to publish static html docs.

Possibly, though I think doc.wikimedia.org still makes the most sense for that particular task, IMO. It might be worth using something like wmfusercontent.org (or a similar, new domain/server) to host gitlab-related raw content, as we do for Phabricator. That would provide for a bit more security/isolation and could be built to function as an intentional CDN.

It might be worth using something like wmfusercontent.org (or a similar, new domain/server) to host gitlab-related raw content, as we do for Phabricator. That would provide for a bit more security/isolation and could be built to function as an intentional CDN.

Tentatively at least, I like that approach.