Page MenuHomePhabricator

Provide authenticated access to Thanos native web interface
Closed, ResolvedPublic

Description

Thanos query instances are proxied by apache on the machine it is running on. At the moment the way to access the native interface (e.g. for easier data/query exploration) is through an ssh tunnel (e.g. ssh thanos-fe1001.eqiad.wmnet -L8000:localhost:80 and then http://localhost:8000) but it'd be more convenient to expose the web interface behind frontend layer and SSO instead.

  • Setup apache SSO auth for NDA access
  • Setup external DNS names
  • Setup HTTP routing

Event Timeline

re: nginx+ldap, it looks like the ldap auth module isn't included, though we can use pam auth for nginx and libpam-ldap as ldap client

I don't think we should mess with the system's PAM config for this -- that's going to be a dangerous change, especially in the long run.

Change 377332 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] prometheus::web to apache

https://gerrit.wikimedia.org/r/377332

Change 377332 merged by Andrew Bogott:
[operations/puppet@production] prometheus::web to apache

https://gerrit.wikimedia.org/r/377332

Im tempted to add this directly to apereo cas (time permitting) however im curious what you had in mind for the service domain names considering we need one for each codfw and eqiad?

Something like:

https://prometheous.codfw.wikimedia.org/
https://prometheous.eqiad.wikimedia.org/

or did you have something else in mind?

Im tempted to add this directly to apereo cas (time permitting) however im curious what you had in mind for the service domain names considering we need one for each codfw and eqiad?

Something like:

https://prometheous.codfw.wikimedia.org/
https://prometheous.eqiad.wikimedia.org/

or did you have something else in mind?

I think something like that would work great!

A braindump / couple of notes:

  • We'll need records/entries for all sites not only codfw/eqiad
  • Reverse-proxying from the public hostname to prometheus.svc.$site.wmnet should be enough but https will be needed on the apache side
  • We'll likely need a redirect say from / to /ops (sort of the default anyways, exists in all sites) or a landing page to list all options/instances available
fgiunchedi renamed this task from Provide authenticated access to Prometheus native web interface to Provide authenticated access to Thanos native web interface.Jul 6 2020, 11:25 AM
fgiunchedi updated the task description. (Show Details)

Taking over this issue to provide access to Thanos instead, which provides a unified query interface.

Change 615212 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] profile::thanos::httpd: move defaults to hiera

https://gerrit.wikimedia.org/r/615212

Change 615213 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] profile::thanos::frontend: Add SSO

https://gerrit.wikimedia.org/r/615213

Change 615215 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] profile::thanos::frontend: enable sso for all thanos frontends

https://gerrit.wikimedia.org/r/615215

Change 615216 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] profile::thanos::frontend: only support SSO on thanos

https://gerrit.wikimedia.org/r/615216

Change 615212 merged by Jbond:
[operations/puppet@production] profile::thanos::httpd: move defaults to hiera

https://gerrit.wikimedia.org/r/615212

Change 615216 abandoned by Jbond:
[operations/puppet@production] profile::thanos::frontend: only support SSO on thanos

Reason:
no needed

https://gerrit.wikimedia.org/r/615216

Change 615215 abandoned by Jbond:
[operations/puppet@production] profile::thanos::frontend: enable sso for all thanos frontends

Reason:
automaticly on all nodes now

https://gerrit.wikimedia.org/r/615215

Change 615477 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] thanos::frontend: add ssl terminations for thanos.* SNI's

https://gerrit.wikimedia.org/r/615477

Change 615213 merged by Jbond:
[operations/puppet@production] profile::thanos::frontend: Add SSO

https://gerrit.wikimedia.org/r/615213

Change 615504 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] thanos: add new lvs thanos service

https://gerrit.wikimedia.org/r/615504

Change 615500 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] thanos: add LVS/discovery records for thanos.discovery.wmnet

https://gerrit.wikimedia.org/r/615500

Change 615504 abandoned by Jbond:
[operations/puppet@production] thanos: add new lvs thanos service

Reason:
not required

https://gerrit.wikimedia.org/r/615504

Change 615500 abandoned by Jbond:
[operations/dns@master] thanos: add LVS/discovery records for thanos.discovery.wmnet

Reason:
not required

https://gerrit.wikimedia.org/r/615500

Change 615477 merged by Jbond:
[operations/puppet@production] thanos::frontend: add ssl terminations for thanos.* SNI's

https://gerrit.wikimedia.org/r/615477

Change 615720 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] lvs - thanos-query: update to use port 443 instead of port 80

https://gerrit.wikimedia.org/r/615720

Change 615733 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] profile: move thanos-query clients to https

https://gerrit.wikimedia.org/r/615733

Change 615720 merged by Jbond:
[operations/puppet@production] lvs - thanos-query: update to use port 443 instead of port 80

https://gerrit.wikimedia.org/r/615720

Change 615733 merged by Filippo Giunchedi:
[operations/puppet@production] profile: move thanos-query clients to https

https://gerrit.wikimedia.org/r/615733

Change 617105 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] thanos: add thanos.wikimedia.org top the cache layer

https://gerrit.wikimedia.org/r/617105

Change 617107 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] thanos: add thanos cname pointing to cache

https://gerrit.wikimedia.org/r/617107

Change 617105 merged by Jbond:
[operations/puppet@production] thanos: add thanos.wikimedia.org to the cache layer

https://gerrit.wikimedia.org/r/617105

Change 617107 merged by Jbond:
[operations/dns@master] thanos: add thanos cname pointing to cache

https://gerrit.wikimedia.org/r/617107

Change 617420 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add caching rule for thanos-query.discovery.wmnet

https://gerrit.wikimedia.org/r/617420

Change 617456 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] thanos-sso: create a discover name to be used by the authenticate FE

https://gerrit.wikimedia.org/r/617456

Change 617457 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] ATS - thanos: update thanos.wikimedia.org to use CNAME based name

https://gerrit.wikimedia.org/r/617457

Change 617456 merged by Jbond:
[operations/dns@master] thanos-sso: create a discover name to be used by the authenticate FE

https://gerrit.wikimedia.org/r/617456

Change 617457 merged by Jbond:
[operations/puppet@production] ATS - thanos: update thanos.wikimedia.org to use CNAME based name

https://gerrit.wikimedia.org/r/617457

jbond closed this task as Resolved.EditedJul 30 2020, 2:19 PM

This is now configured with CAS-SSO authentication at https://thanos.wikimedia.org

thanos.wikimedia.org has been configured on the front end caches to proxy to thanos-sso.discovery.wment. which in turn is a CNAME pointing to an individual thanos-fe server.

An Initial attempt was made to use thanos-query.discovery.wment as the backend address which uses a Weighted Round Robin policy to distribute queries among all thanos-fe servers. however this was unsuccessful to to the lack of a distributed session store in moad_auth_cas.

We did explore utilizing a hashing algorithm to distribute queries to back-ends based on client IP addresses which should have overcome the issues with session management. however the main user of thanos-query is grafan.wikimedia.org which is better served with a round robin scheduling policy. It is also expected that a single server will be able to easily handle the traffic load of thanos.wikimedia.org as such we decided to KISS and use the tried and tested method of using a CNAME based distribution.wmnet address

Change 617478 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] thanos-sso: add descriptive comment and task reference

https://gerrit.wikimedia.org/r/617478

Change 617478 merged by Jbond:
[operations/dns@master] thanos-sso: add descriptive comment and task reference

https://gerrit.wikimedia.org/r/617478

Change 617420 abandoned by Ema:
[operations/puppet@production] ATS: add caching rule for thanos-query.discovery.wmnet

Reason:
Not necessary anymore: https://phabricator.wikimedia.org/T259692

https://gerrit.wikimedia.org/r/617420