Page MenuHomePhabricator

prometheus-beta.wmcloud.org 504 timeout
Closed, ResolvedPublicBUG REPORT

Description

(am mobile, so please excuse the brevity)

I was reading https://wikitech.wikimedia.org/wiki/Prometheus#Access_Prometheus_web_interface and clicked on the link to https://beta-prometheus.wmflabs.org/beta/graph — this timed out with a 504 gateway timeout.

I'm not sure if this has ever/recently worked, so this might be not be a big deal.

I've also tested https://prometheus-beta.wmcloud.org directly, and had a quick look at https://openstack-browser.toolforge.org/project/deployment-prep to check the proxy still existed (it does, but I also note https://beta-prometheus.wmcloud.org/ exists too..)

Event Timeline

Considering nothing appears to be complaining, and that it's fairly likely this hasn't worked for a while, I'm going to set the priority to low (else people will panic!)

I'll take a look at this when I'm back in front of a PC tomorrow if no one else does 😌

Prometheus is running, on port 9903

samtar@deployment-prometheus02:~$ sudo netstat -ltnp | grep "prometheus"
tcp        0      0 127.0.0.1:9903          0.0.0.0:*               LISTEN      375/prometheus      
tcp        0      0 172.16.0.67:9105        0.0.0.0:*               LISTEN      24066/prometheus-rs 
tcp6       0      0 :::9093                 :::*                    LISTEN      527/prometheus-aler 
tcp6       0      0 :::9094                 :::*                    LISTEN      527/prometheus-aler 
tcp6       0      0 :::9100                 :::*                    LISTEN      371/prometheus-node

I forwarded that via ssh -L9903:localhost:9903 deployment-prometheus02.deployment-prep.eqiad1.wikimedia.cloud and confirmed the web UI loaded — did we want to change the prometheus-beta.wmcloud.org proxy from pointing at port 80?

TheresNoTime claimed this task.

If it's not DNS, it's the firewall /s

(it was the firewall.)

It occurs to me that perhaps this shouldn't be publicly accessible — rather than leave it up and wait for a reply, I've undone my change (and confirmed https://beta-prometheus.wmflabs.org/beta/graph no longer loads). I'd appreciate a clarification on this.

Apologies for the uninvited subscribe @taavi, but you're very knowledgable on these topics — any suggestions or thoughts would be greatly appreciated. No rush of course

TheresNoTime changed the task status from Open to Stalled.Sep 20 2022, 10:20 AM

After speaking with taavi, this seems to be okay to do but should instead be reverse-proxied via apache — on having a quick look, this is already set up but not working (externally) :')

Yet again, this was a painfully simple firewall issue.

Mentioned in SAL (#wikimedia-releng) [2022-09-21T13:55:25Z] <TheresNoTime> modified deployment-prep "prometheus" security group - port 80, T315699