Page MenuHomePhabricator

Need support on hosting an RStudio Shiny Server on a Labs instance behind a proxy
Closed, ResolvedPublic

Description

I need to host an RStudio Shiny Server on a Labs instance (project name: wikidataconcepts; instance name: wikidataconcepts) for an application that is currently under development for WMDE, where I work as a contractor (Data Analyst).

Steps undertaken thus far:

  1. Shiny Server installed and running; the application can be viewed locally;
  1. Followed instructions for setting up a web-proxy and map onto port 3838 (Shiny Server default); @Addshore has verified that the proxy settings are correct;
  1. Followed instructions on https://support.rstudio.com/hc/en-us/articles/213733868-Running-Shiny-Server-with-a-Proxy and tried to set the Nginx server appropriately; failed in spite of following the instructions carefully;
  1. Intensively searched for similar problem reports and solutions; many users are complaining about this; still with no success on my Labs instance.

Please advise. Thank you.

Event Timeline

Intensively searched for similar problem reports and solutions; many users are complaining about this; still with no success on my Labs instance.

Complaining where? I presume you don't mean on labs

Followed instructions on https://support.rstudio.com/hc/en-us/articles/213733868-Running-Shiny-Server-with-a-Proxy and tried to set the Nginx server appropriately; failed in spite of following the instructions carefully;

What failed?

@Reedy "Complaining where? I presume you don't mean on labs" - No, I don't mean complaining on Labs; I mean: many RStudio Shiny Server users are reporting the similar problem in general.

"What failed?" - The Shiny application should be accessible from http://wdcm.wmflabs.org/ - there is a webproxy on the wikidataconcepts Labs instance that maps http://10.68.23.174:3838 onto the DNS hostname: wdcm.wmflabs.org. However, the only thing that I get when trying to reach http://wdcm.wmflabs.org/ is: 504 Gateway Time-out, nginx/1.11.3

N.B. I am already running RStudio Server from that instance, through another webproxy, on port 8787, and I am able to access it with no problems at all.

If all you need is a transparent reverse proxy, you should be able to open port 3838 in your project's security groups and then target the wdcm.wmflabs.org proxy at that instance and port via horizon (https://wikitech.wikimedia.org/wiki/Help:Proxy).

If you need rewriting then you will need to use nginx or apache2 locally on the instance to remap things and instead point the upstream proxy managed in horizon at the nginx/apache2 port on your instance and open that port in your security groups.

If the latter is what you are having problems with, one helpful debugging step would be to post your active nginx/apache2 config so that others can take a look at it and try to help you spot any flaws.

@bd808 @Reedy

I already have a security group for Shiny Server, port 3838 opened.

My /etc/nginx/nginx.conf is as follows (*exactly* what is advised at: https://support.rstudio.com/hc/en-us/articles/213733868-Running-Shiny-Server-with-a-Proxy); please be gentle in your suggestions since my web-administration skills are pretty bad (e.g. have no idea what a "reverse proxy" is, don't know what you mean by "If you need rewriting", etc):

Note: The Shiny-Server Settings section is what I have edited there.

user www-data;                                                                                                                                                                    
worker_processes 4;                                                                                                                                                               
pid /run/nginx.pid;                                                                                                                                                               
                                                                                                                                                                                  
events {                                                                                                                                                                          
        worker_connections 768;                                                                                                                                                   
        # multi_accept on;                                                                                                                                                        
}                                                                                                                                                                                 
                                                                                                                                                                                                                                                                               
http {                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                               
        ##                                                                                                                                                                                                                                                                     
        # Basic Settings                                                                                                                                                                                                                                                       
        ##                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                               
        sendfile on;                                                                                                                                                                                                                                                           
        tcp_nopush on;                                                                                                                                                                                                                                                         
        tcp_nodelay on;                                                                                                                                                                                                                                                        
        keepalive_timeout 65;                                                                                                                                                                                                                                                  
        types_hash_max_size 2048;                                                                                                                                                                                                                                              
        # server_tokens off;                                                                                                                                                                                                                                                   

        # server_names_hash_bucket_size 64;
        # server_name_in_redirect off;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # Logging Settings
        ##

        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        ##
        # Gzip Settings
        ##

        gzip on;
        gzip_disable "msie6";

        # gzip_vary on;
        # gzip_proxied any;
        # gzip_comp_level 6;
        # gzip_buffers 16 8k;
        # gzip_http_version 1.1;
        # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

        ##
        # nginx-naxsi config
        ##
        # Uncomment it if you installed nginx-naxsi
        ##

        #include /etc/nginx/naxsi_core.rules;

        ##
        # nginx-passenger config
        ##
        # Uncomment it if you installed nginx-passenger
        ##

        #passenger_root /usr;
        #passenger_ruby /usr/bin/ruby;

        ##
        # Virtual Host Configs
        ##

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;

        ##
        # Shiny-Server Settings
        ##
        map $http_upgrade $connection_upgrade {
                default upgrade;
                ''      close;
        }

        server {
                listen 80;
    
                location / {
                        proxy_pass http://localhost:3838;
                        proxy_redirect http://localhost:3838/ $scheme://$host/;
                        proxy_http_version 1.1;
                        proxy_set_header Upgrade $http_upgrade;
                        proxy_set_header Connection $connection_upgrade;
                        proxy_read_timeout 20d;
                }
        }
}


#mail {
#       # See sample authentication script at:
#       # http://wiki.nginx.org/ImapAuthenticateWithApachePhpScript
# 
#       # auth_http localhost/auth.php;
#       # pop3_capabilities "TOP" "USER";
#       # imap_capabilities "IMAP4rev1" "UIDPLUS";
# 
#       server {
#               listen     localhost:110;
#               protocol   pop3;
#               proxy      on;
#       }
# 
#       server {
#               listen     localhost:143;
#               protocol   imap;
#               proxy      on;
#       }
#}

@bd808 @Reedy

I already have a security group for Shiny Server, port 3838 opened.

This config is a 1-to-1 reverse proxy. Naively I think you can just point the Horizon managed proxy at port 3838 on your instance.

The Horizon managed proxy is already pointed at port 3838, IP Protocol = TCP, Remote IP Prefix = 0.0.0.0/0, and still, it does not work.

I wouldn't ask for support here if I did not try out many by-the-documentation solutions. I've spent a considerable amount of time learning and experimenting before I've decided to create this Phab task (as I always do, in order not to consume anyone's time unnecessarily). But I am afraid that this is were my expertize in web-administration as a Data Analysts simply ends. I can't do it. Please, advise.

Looking at https://tools.wmflabs.org/openstack-browser/project/wikidataconcepts shows that your proxies are pointing to ci-jessie-wikimedia-486020.contintcloud.eqiad.wmflabs as the backend rather than to wikidataconcepts.wikidataconcepts.eqiad.wmflabs. I think the first thing to do is to go into the Horizon app, delete the current proxy records, and recreate them with the appropriate project instance as the backend.

My completely random guess about how this happened is that you created the proxies pointed at a VM and then later deleted that VM and created a new one possibly reusing the previous hostname. The proxies created via Horizon are bound to IP addresses rather than hostnames for the backend, so if this happened the proxy would point to the next VM that was allocated the old IP.

Ok, but how is it possible that I can reach RStudio Server on port 8787 then (that would be: http://wikidataconcepts.wmflabs.org/, I'm currently working there from my browser)?

Ok, but how is it possible that I can reach RStudio Server on port 8787 then (that would be: http://wikidataconcepts.wmflabs.org/, I'm currently working there from my browser)?

It looks like the proxy is actually pointed to 10.68.23.174 which is the actual IP of wikidataconcepts.wikidataconcepts.eqiad.wmflabs. The contintcloud thing is a complete red herring caused by a failure in our OpenStack infrastructure to clean up a DNS record.

Past this point, the issues seem to be with the software you have chosen to install and configure on your VMs. I'd love for the cloud-services-team to be able to provide detailed support to all Cloud Services projects, but we just don't have the resources to do so. Hopefully @Addshore or someone else at WMDE can help you look through things and find the problem. One thing to look for is configuration that makes your Shiny Server running on port 3838 bind to the 127.0.0.1 or ::1 interface rather than 0.0.0.0:3838 or :::3838. That would make the service accessible via local cURL calls but unable to be reached by the external reverse proxy.

@bd808 @Reedy Thank you very much for your efforts. As I've told you, it seems that running the RStudio Shiny Server under a web-proxy causes trouble worldwide.

@Addshore @Tobi_WMDE_SW We're on our own. What can we do about this? It is frustrating, I am developing a rather complex Shiny application than no one without @Addshore or mine LDAP can use...

@bd808 @Reedy Thank you very much for your efforts. As I've told you, it seems that running the RStudio Shiny Server under a web-proxy causes trouble worldwide.

@Addshore @Tobi_WMDE_SW We're on our own. What can we do about this? It is frustrating, I am developing a rather complex Shiny application than no one without @Addshore or mine LDAP can use...

File an upstream support request to see if they can help you and/or fix their software?

@GoranSMilovanovic have the Analytics team been poked regarding this issue? as they already have some sort of shiny / rstudio setup?

@Addshore Not yet; they are my last resort. I have tagged this ticket by Analytics too, but they've removed the tag, probably thinking that Labs will take care of it...

proxy_read_timeout 20d; <= Does Shiny server use persistent connections? I'm unsure whether it works fine in the current configuration. (Although WebSockets do work on tool labs, which use a similar configuration)

@zhuyifei1999 Could you explain what do you mean by "persistent connections"? I am not a web-admin. Also, how do you propose the proxy_read_timeout should be changed?

I'll experiment with it (try setting one up) later. By "persistent connections" I meant that (it's just my guess) Shiny expects the connection kept open, but nginx will not proxy the data to the client until the connection is closed. See:

According to the last link, nginx will establish a bi-directional websocket connect if the WebSocket handshake ritual is performed. (My guess) this may not have occurred in your case, and nginx waits until it times out with 504 Gateway Time-out.

@zhuyifei1999 Are you a Shiny Server user too?

Thank you very much for offering support. I have read the material that you have shared, however, I have to admit: most of it goes well beyond my understanding of web-administration. I would be very grateful to learn how to set nginx properly to work with Shiny Server on Labs; if your experiment turns out successful, please let me know ASAP. Thank you.

@zhuyifei1999 Are you a Shiny Server user too?

Nope. I simply like doing challenges ;)

Ok so the steps I've taken:

  1. Created a m1.medium test instance with image ubuntu-14.04-trusty at commonsarchive-test at project commonsarchive (um, I don't have a personal test project so...) with pre-defined security group Web that has port 80 open.
  2. Installed the server by blindly following https://www.rstudio.com/products/shiny/download-server/
    • Complains about missing g++, so apt install build-essential
  3. started ssh commonsarchive-test.commonsarchive.eqiad.wmflabs -D 12346 and tunneled all browser traffic of *.eqiad.wmflabs to localhost:12346
  4. visit http://commonsarchive-test.commonsarchive.eqiad.wmflabs:3838/, got Welcome to Shiny Server! page, assume working
  5. apt install nginx-full, copied T167702#3341941 to /etc/nginx/nginx.conf, and restarted nginx
  6. visit http://commonsarchive-test.commonsarchive.eqiad.wmflabs/, got standard Welcome to nginx! page
    • So nginx used the congig in /etc/nginx/sites-enabled/default for the server config, so commented out include /etc/nginx/sites-enabled/*, restart nginx, refresh, got Welcome to Shiny Server!.

I will now test adding the labs reverse proxy.

So setup a web proxy at http://commonsarchive-test.wmflabs.org/, got Welcome to Shiny Server!. Cannot reproduce 504, nor I have any idea about which step in the process that may have gone wrong for you.

@GoranSMilovanovic Can you curl localhost:3838 from within the instance to verify if the port is open from within? In my case it's something like:

zhuyifei1999@commonsarchive-test:~$ curl localhost:3838
<!DOCTYPE html>
<html lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Welcome to Shiny Server!</title>
  <style type="text/css">
    body, html {
      font-family: Helvetica, Arial, sans-serif;
      background-color: #F5F5F5;
      color: #114;
      margin: 0;
      padding: 0;
    }
[...]

If that does not show up the issue has probably nothing to do with the proxy. Otherwise, if the page does show up, I'd say something is wrong with either how shiny is listening for incoming connections (T167702#334278), or something is wrong with the firewall / security group.

@zhuyifei1999 curl localhost:3838 from my Labs instance returns the content of the Welcome to Shiny Server page, and that is exactly what should be found on :3838 upon successful installation of the Shiny Server.

Something interesting about the security groups is that it only affects communications between different projects (i.e. communication between Nova_Resource:Wikidataconcepts and Nova_Resource:Project-proxy that does the labs-wide web proxy), and communication between an instance and the outside world, if you have an external floating IP allocated (which you don't). Communication within the project should not be affected by security groups.

@GoranSMilovanovic Do you have the quota to create another test instance in the same project (even m1.small is okay)? If curl wikidataconcepts.wikidataconcepts.eqiad.wmflabs:3838 runs fine on this test instance, but not instances of other projects (I tried this on a few instances I have access to, none works, all timeout), then I can think of no reason besides security group preventing access.

Sorry about my earlier talk about WebSockets and persistent connections, apparently they are irrelevant to this issue, by my testing.

Also, you previously said:

I already have a security group for Shiny Server, port 3838 opened.

Can you make sure this security group is applied to the instance wikidataconcepts.wikidataconcepts.eqiad.wmflabs? https://horizon.wikimedia.org/project/instances => Actions => Edit Security Groups.

@zhuyifei1999 You're awesome, thanks a lot! I have a Shiny Server running and publicly accessible from from the wikidataconcepts Labs instance; that was made possible by following precisely the steps that you have exemplified here, and I think the main problem was related to the security group settings. It was obviously due to my inexperience in working with Horizon and the Tools/Labs virtual environment (this is my first Labs instance...). Again, thank you very much!

Addshore rescinded a token.
Addshore awarded a token.

Indeed, thanks @zhuyifei1999 ! :)

BTW, now that T161354 is done, Ops/Puppet has a Shiny Server module: https://github.com/wikimedia/puppet/tree/production/modules/shiny_server

For example, we are now using it for our puppet-managed dashboards: https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/discovery/dashboards.pp (role) and https://github.com/wikimedia/puppet/tree/production/modules/profile/manifests/discovery_dashboards (profile)

Perhaps at some point we can collaborate on getting an RStudio Server module into Puppet also? :)

@mpopov Perhaps at some point we could have RStudio Server running from the production machines? Just sayin'... :)