Page MenuHomePhabricator

LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN
Closed, ResolvedPublic5 Story Points

Description

Services (Jupyterhub, Hue) in the Analytics VLAN are not able to reach ldap-ro.eqiad.wikimedia.org. Services that use e.g. ldap-labs.eqiad.wikimedia.org (yarn.wikimedia.org, turnilo.wikimedia.org) seem fine.

[@notebook1003:/home/otto] $ telnet ldap-ro.eqiad.wikimedia.org 389
Trying 208.80.154.252...


[@notebook1003:/home/otto] $ telnet ldap-labs.eqiad.wikimedia.org 389
Trying 208.80.154.79...
Connected to seaborgium.wikimedia.org.
Escape character is '^]'.

ldap-ro.eqiad.wikimedia.org seems fine from e.g. mwmaint1002:

[@mwmaint1002:/home/otto] $ telnet ldap-ro.eqiad.wikimedia.org 389
Trying 208.80.154.252...
Connected to ldap-ro.eqiad.wikimedia.org.
Escape character is '^]'.

This just started happening today. Has something changed with ldap-ro.eqiad.wikimedia.org or Analytics VLAN ACLs?

Event Timeline

Ottomata created this task.Jul 9 2019, 8:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 9 2019, 8:19 PM
Ottomata triaged this task as High priority.Jul 9 2019, 8:25 PM
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: elukey, Dzahn.

From puppet I can see that the change for ldap-ro was reverted:

elukey@notebook1003:~$ sudo grep ldap /var/log/puppet.log
Jul  9 17:46:07 notebook1003 puppet-agent[30332]: (/Stage[main]/Jupyterhub/File[/etc/jupyterhub/jupyterhub_config.py]/content)  AUTHENTICATOR        = 'ldap'
Jul  9 17:46:07 notebook1003 puppet-agent[30332]: (/Stage[main]/Jupyterhub/File[/etc/jupyterhub/jupyterhub_config.py]/content) -LDAP_SERVER          = 'ldap-labs.eqiad.wikimedia.org'
Jul  9 17:46:07 notebook1003 puppet-agent[30332]: (/Stage[main]/Jupyterhub/File[/etc/jupyterhub/jupyterhub_config.py]/content) +LDAP_SERVER          = 'ldap-ro.eqiad.wikimedia.org'
Jul  9 23:45:55 notebook1003 puppet-agent[22070]: (/Stage[main]/Jupyterhub/File[/etc/jupyterhub/jupyterhub_config.py]/content)  AUTHENTICATOR        = 'ldap'
Jul  9 23:45:55 notebook1003 puppet-agent[22070]: (/Stage[main]/Jupyterhub/File[/etc/jupyterhub/jupyterhub_config.py]/content) -LDAP_SERVER          = 'ldap-ro.eqiad.wikimedia.org'
Jul  9 23:45:55 notebook1003 puppet-agent[22070]: (/Stage[main]/Jupyterhub/File[/etc/jupyterhub/jupyterhub_config.py]/content) +LDAP_SERVER          = 'ldap-labs.eqiad.wikimedia.org'

There are two issues here:

  1. We'll need to fix the ACLs so that the analytics VLAN can access the ldap-ro replicas, there's a wider plan to switch all LDAP to the read-only replicas and instead of serpens/seaborgium
  2. The jupyterhub config currently uses the labsldapconfig Hiera setting to select the LDAP servers, so it got migrated along when 67e1e84735ba was merged. This is wrong, jupyterhub is a service running in production and should not reuse the setting used for Cloud VPS instances.
elukey added a subscriber: ayounsi.Jul 10 2019, 7:35 AM

About 1.

elukey@re0.cr1-eqiad# show | compare
[edit firewall family inet filter analytics-in4 term ldap from destination-address]
         208.80.154.79/32 { ... }
+        /* ldap-ro */
+        208.80.154.252/32;

Changed on both cr1/cr2 eqiad (cc: @ayounsi)

elukey@notebook1003:~$ telnet ldap-ro.eqiad.wikimedia.org 389
Trying 208.80.154.252...
Connected to ldap-ro.eqiad.wikimedia.org.
Escape character is '^]'.
^]

Change 521832 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Use the ldap-ro endpoint for Hue and Jupyter

https://gerrit.wikimedia.org/r/521832

I am a little bit lost with LDAP config, since we use:

  1. ldap-labs.eqiad.wikimedia.org in Jupyterhub's config without any proxy setting (and without any pass afaics)
  2. ldap-labs.eqiad.wikimedia.org in httpd's config, with a proxy+pass setting (seems not working without the pass from my tests on an-tool1005)

If the pass is still needed it would be great to have a single point of config in hiera for ldap-ro where people can grab info and put in their configs (to DRY a little bit).

Change 522073 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Introduce openldap_config in hiera

https://gerrit.wikimedia.org/r/522073

Change 522073 merged by Elukey:
[operations/puppet@production] Introduce a ldap config in hiera

https://gerrit.wikimedia.org/r/522073

Change 523111 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hue|swap: use new ldap hiera configuration

https://gerrit.wikimedia.org/r/523111

elukey claimed this task.Jul 15 2019, 9:46 AM
elukey edited projects, added Analytics-Kanban; removed Patch-For-Review, netops, LDAP, Operations.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 523111 merged by Elukey:
[operations/puppet@production] profile::hue|swap: use new ldap hiera configuration

https://gerrit.wikimedia.org/r/523111

ayounsi removed a subscriber: ayounsi.Jul 15 2019, 3:42 PM

Change 521832 abandoned by Elukey:
profile::swap: use the ldap-ro endpoint

https://gerrit.wikimedia.org/r/521832

The services in analytics that use LDAP are:

  • Hue
  • Jupyter notebooks
  • Yarn (via httpd)
  • Superset (via httpd)
  • Turnilo (via httpd)

The last three will be taken care by a T227650, and the first two have been fixed by https://gerrit.wikimedia.org/r/523111.

elukey set the point value for this task to 5.Jul 16 2019, 6:59 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.
Milimetric closed this task as Resolved.Jul 18 2019, 4:49 PM
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.