Page MenuHomePhabricator

Request to return 405 on POST calls to SPARQL endpoint, Wikidata primary sources tool VPS project
Open, Needs TriagePublic

Description

Use case: the client should not be allowed to modify data on the server storage engine (Blazegraph);

Issue: by default, Blazegraph allows POST on the /sparql service to perform ACID operations on the database;

Request: please add the following NGINX directive to the VPS instance virtual server (updated as per T192292#4160588).

location ~ /(sparql|dataloader|tx|status|namespace) {
    if ($request_method != GET) {
        return 405;
    }
}

Event Timeline

Hjfocs created this task.Apr 16 2018, 2:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 16 2018, 2:30 PM
Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Apr 23 2018, 7:30 AM

Important updates:

  • all non-GET requests should be disallowed;
  • the same behavior should also apply to the following endpoints: /dataloader /namespace /tx /status

Task description edited accordingly.

Hjfocs updated the task description. (Show Details)Apr 26 2018, 11:04 AM

@Hjfocs I'm not convinced that adding project specific security features to the shared HTTPS proxy server is scalable or sustainable. It seems like it would be easier for you to introduce a security proxy within your project.

Thanks for the update, @bd808 . the NGINX directive seemed to me the easiest way.
Do you have any suggestions about alternative solutions?
For instance, would it be possible to setup a project-specific NGINX instance? I investigated that, but could not find any documentation.

For instance, would it be possible to setup a project-specific NGINX instance? I investigated that, but could not find any documentation.

@bd808, talking about available VPS puppet profiles/roles

Thanks for the update, @bd808 . the NGINX directive seemed to me the easiest way.

I agree that it is easiest for you, but for us it creates a tight coupling between your Cloud VPS project and our infrastructure. This tight coupling will complicate all future changes that we make to the proxy layer. Its very likely that at some point your assumption of upstream protection from the shared proxy would end up being broken accidentally.

For instance, would it be possible to setup a project-specific NGINX instance? I investigated that, but could not find any documentation.

@bd808, talking about available VPS puppet profiles/roles

There is a very full featured Puppet module for managing an nginx service, but there is not a convenient role or profile class that generically manages nginx for a project. The reason for this is fundamentally that a custom file or template is needed to configure the nginx server. The base nginx class sets up Puppet rules to ensure that only Puppet managed vhost configurations are enabled. This is a great practice for our production servers, but it makes things a bit more complicated for Cloud VPS projects.

To use the ::nginx module in a Cloud VPS project you would need to write a custom role class and either get it merged into the main operations/puppet.git repo or setup a project local Puppetmaster and add the class there. The "easy" thing to do is sudo apt-get install nginx and then edit the configuration files, but Puppet automation can be nicer in the long term.

To use the ::nginx module in a Cloud VPS project you would need to write a custom role class and either get it merged into the main operations/puppet.git repo or setup a project local Puppetmaster and add the class there. The "easy" thing to do is sudo apt-get install nginx and then edit the configuration files, but Puppet automation can be nicer in the long term.

@bd808 , I'm trying the "easy" thing, but I don't know which IP the server directive should listen to here.
If you curl -v https://pst.wmflabs.org/v2, the IP is 208.80.155.156.
However, listen 208.80.155.156:443; raises error code 99 in the nginx log:

bind() to 208.80.155.156:443 failed (99: Cannot assign requested address)

Would you be so kind to give me a hint?

And sorry if the question sounds silly, I'm no sysadmin :-)

And sorry if the question sounds silly, I'm no sysadmin :-)

It's never silly to ask questions. :)

The IP that you see via curl belongs to the virtual machine in the project-proxy Cloud VPS project where the proxy server is currently deployed. This IP is from our public pool, meaning that it is reachable from the general internet. Your virtual machine instance does not have a public IP. It instead has a private (10.x.x.x) address that is allocated dynamically when the instance is first provisioned.

I would suggest using simply listen 80;. Port 443 is for HTTPS traffic. The Cloud VPS proxy will take care of that for you and then reverse proxy via plain HTTP to nginx on your Cloud VPS instance. Using the bare port number is equivalent to listen *:80 which will make the nginx service available on all local IP addresses on the instance. This will not cause any security or performance problems and keeps you from needing to make the configuration exactly match the virtual machine's IP stack configuration.

Thanks a lot for the clear explanation!

I would suggest using simply listen 80;

I tried your suggestion and:

  • binding works fine (after stopping an Apache instance that was listening to the same port. I have never explicitly installed Apache, but it was there);
  • the directive has no effect;
  • /var/log/nginx/access.log is empty.

I used the same directive in a separate test server of mine, and it does work fine.

I could not find the listen 80; directive in any file in /etc/nginx on pst.wikidata-primary-sources-tool.eqiad.wmflabs. Nginx was running, but did not seem to be bound to port 80. I made these changes:

  • Added listen 80; to /etc/nginx/sites-available/pst
  • Symlinked /etc/nginx/sites-enabled/pst to /etc/nginx/sites-available/pst
  • Restarted nginx

I can now curl the nginx install locally:

$ curl -v http://localhost/
* Hostname was NOT found in DNS cache
*   Trying ::1...
* connect to ::1 port 80 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
* Server nginx/1.13.6 is not blacklisted
< Server: nginx/1.13.6
< Date: Thu, 31 May 2018 19:54:42 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Wed, 11 Oct 2017 07:33:46 GMT
< Connection: keep-alive
< ETag: "59ddc95a-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host localhost left intact

The /etc/nginx/sites-available/pst config will need more settings added to it in order to reverse proxy to your Blazegraph instance. See https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/ for some help.