Put our ldap servers behind LVS
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Andrew
	Mar 12 2019, 5:36 PM

Description

T217280 has uncovered a fair number of sub-issues. One of the most pressing ones is that sometimes when an ldap server restarts, the grid engine node using that server freaks out and gets depooled.

As far as I can tell, the traditional way to provide redundancy for ldap is on the client side -- ldap.conf contains urls for multiple ldap servers and the client is meant to deal with fail-overs. Experience (in the grid engine and elsewhere) shows that this doesn't actually work very well... it only fails over after time outs and errors and other messes.

So, let's take this out of the clients' hands and put all ldap access behind a single service name and service IP. Then if we need to keep restarting ldap servers due to the memory leak, that instability will be less obvious to clients.

Details

Subject	Repo	Branch	Lines +/-
lvs: fix ldap-ro and ldap-ro-ssl depool thresholds	operations/puppet	production	+2 -2
Add lvs to the read-only ldap replicas	operations/puppet	production	+61 -0
Add lvs to the read-only ldap replicas	operations/puppet	production	+61 -0
Service name and IPs for ldap-behind-lvs	operations/dns	master	+8 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	yuvipanda	T130446 Unable to SSH onto tools-login.wmflabs.org
Resolved	akosiaris	T130593 investigate slapd memory leak
Resolved	aborrero	T217280 LDAP server running out of memory frequently and disrupting Cloud VPS clients
Resolved	Andrew	T46720 Only list LDAP servers location in the same datacenter in the nslcd configuration
Resolved	Andrew	T218133 Put our ldap servers behind LVS
Resolved	Andrew	T46722 Add two read-only LDAP servers in eqiad
Resolved	akosiaris	T218224 Ganeti request: two new VMs in eqiad for ldap
Resolved	Andrew	T218398 Update openldap profile to use LE

Event Timeline

Andrew created this task.Mar 12 2019, 5:36 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 12 2019, 5:36 PM

Andrew updated the task description. (Show Details)Mar 12 2019, 5:36 PM

Andrew added a subtask: T46722: Add two read-only LDAP servers in eqiad.

Andrew added a parent task: T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients.

Andrew added a subscriber: • MoritzMuehlenhoff.

Paladox subscribed.Mar 12 2019, 5:58 PM

Peachey88 added a project: LDAP.Mar 12 2019, 7:05 PM

Change 496007 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] Service name and IPs for ldap-behind-lvs

https://gerrit.wikimedia.org/r/496007

gerritbot added a project: Patch-For-Review.Mar 12 2019, 8:49 PM

Change 496065 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Add an lvs service in front of our two ldap servers

https://gerrit.wikimedia.org/r/496065

Current plan is to add two new read-only hosts (on internal IPs) and put LVS in front of them, then use that endpoint exclusively for cloud VMs access.

Change 496007 merged by Andrew Bogott:
[operations/dns@master] Service name and IPs for ldap-behind-lvs

https://gerrit.wikimedia.org/r/496007

Change 496065 merged by Andrew Bogott:
[operations/puppet@production] Add lvs to the read-only ldap replicas

https://gerrit.wikimedia.org/r/496065

Change 496858 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Add lvs to the read-only ldap replicas

https://gerrit.wikimedia.org/r/496858

Change 496858 merged by Andrew Bogott:
[operations/puppet@production] Add lvs to the read-only ldap replicas

https://gerrit.wikimedia.org/r/496858

Andrew closed subtask T46722: Add two read-only LDAP servers in eqiad as Resolved.Mar 22 2019, 3:47 AM

There are now two read-only replicas in eqiad behind the endpoint ldap-ro.eqiad.wikimedia.org

Andrew reopened subtask T46722: Add two read-only LDAP servers in eqiad as Open.Mar 22 2019, 3:55 AM

Change 498343 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] lvs: fix ldap-ro and ldap-ro-ssl depool thresholds

https://gerrit.wikimedia.org/r/498343

Change 498343 merged by Ema:
[operations/puppet@production] lvs: fix ldap-ro and ldap-ro-ssl depool thresholds

https://gerrit.wikimedia.org/r/498343

Mentioned in SAL (#wikimedia-operations) [2019-03-22T11:18:51Z] <ema> lvs1005: bounce pybal to clear backends health icinga warning T218133

Mentioned in SAL (#wikimedia-operations) [2019-03-22T11:22:07Z] <ema> lvs1002: bounce pybal to clear backends health icinga warning T218133

• GTirloni added a parent task: T46720: Only list LDAP servers location in the same datacenter in the nslcd configuration.Mar 24 2019, 10:48 AM

Andrew closed subtask T46722: Add two read-only LDAP servers in eqiad as Resolved.Mar 25 2019, 2:37 PM

This is running, and working OK. Our anti-memory-leak cron is still firing pretty often; maybe on the replicas it can depool before killing to prevent clients from getting unexpected disconnects...

Maintenance_bot removed a project: Patch-For-Review.May 22 2019, 1:32 PM

I can't remember why I didn't close this before.

Put our ldap servers behind LVSClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Put our ldap servers behind LVS
Closed, ResolvedPublic
Actions

Related Objects
Search...