Page MenuHomePhabricator

LB for cloudelastic
Closed, ResolvedPublic

Description

cloudelastic refers to cloudelastic100[1234].wikimedia.org.

There are two separate streams of requests that cloudelastic cares about:

prod mediawiki -> cloudelastic:9[246]43. https, tls terminated by nginx. Receives production document updates

internet -> cloudelastic:8[246]43. tls terminated by nginx. Only accepts GET requests. Receives search queries. Limited by ferm to wmf cloud ip ranges not for privacy, but for service stability.

Ideally both of these should have a load balancer (likely LVS) in front of them so nodes can be cleanly added/removed/etc. without any clients knowing or caring.

Event Timeline

Change 512924 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/dns@master] Add cloudelastic LVS to DNS

https://gerrit.wikimedia.org/r/512924

Change 512925 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] LVS for cloudelastic

https://gerrit.wikimedia.org/r/512925

Will bring up in Scrums of Scrums :)

@Andrew Talked to @BBlack today and he said you two had talked about taking a different direction in terms of networking to expose bare metal services to cloud. The situation today is:

  • cloudelastic100[1-4] all have public ip addresses. The access provided by the service is acceptable for full public release, but for operational reasons we are limiting via ferm to only cloud ip ranges.
  • Production mediawiki job runners need to write to this service. Because we don't have a LB to handle pool/depool operations we are currently only sending writes for group0. Once an LB is in place we will be sending writes from all wikis.
  • The servers each run 3 services on 6 ports. 3 ports accept read and write requests and are only accessible from prod hosts. The other 3 ports only accept read requests and are publicly accessible.
  • The plan initially was to expose these 6 ports via LVS on cloudelastic.wikimedia.org, in the attached patch.

Would this fit into the plans Brandon mentioned for some sort of haproxy and cloud specific domain names and address spaces? In particular possibly our need to push writes from production job runners makes this different from other services you've been considering (or maybe not, mysql is similar). If so, what kind of timeline is this on?

@EBernhardson Jason is in the process of building out an HA solution for our internal services, so I'm cc'ing him on this task. I think he'll have a lot more to contribute than I have.

@EBernhardson Unfortunately we don't have a good solution for this today or in the near future. We've discussed future load balancing as a service options, but this requires a lot of effort on backend upgrades, automation and configuration.

The HA architecture that we discussed with @BBlack was focused on the backend OpenStack services. We've decided to not go that route and opt for a less complex solution that is dedicated to the OpenStack controllers.

Change 512924 merged by BBlack:
[operations/dns@master] Add cloudelastic LVS to DNS

https://gerrit.wikimedia.org/r/512924

Mentioned in SAL (#wikimedia-operations) [2019-08-01T17:42:45Z] <bblack> disable puppet on lvs1014 + lvs1016 for cloudelastic LVS merge - T224324

Change 512925 merged by BBlack:
[operations/puppet@production] LVS for cloudelastic

https://gerrit.wikimedia.org/r/512925

Mentioned in SAL (#wikimedia-operations) [2019-08-01T18:30:01Z] <bblack> lvs1016: puppet re-enabled, pybal restarted, cloudelastic deploy - T224324

Mentioned in SAL (#wikimedia-operations) [2019-08-01T19:34:00Z] <bblack> lvs1014 - puppetize and restart pybal for cloudelastic LVS - T224324

Mentioned in SAL (#wikimedia-operations) [2019-08-01T19:57:32Z] <bblack> lvs1016 - restart pybal for slight LVS config change for cloudelastic - T224324

Change 528215 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] cloudelastic: Fix LVS IPv6 address

https://gerrit.wikimedia.org/r/528215

Change 528216 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] cloudelastic: Fix LVS IPv6 address

https://gerrit.wikimedia.org/r/528216

Change 528216 merged by BBlack:
[operations/dns@master] cloudelastic: Fix LVS IPv6 address

https://gerrit.wikimedia.org/r/528216

Change 528215 merged by BBlack:
[operations/puppet@production] cloudelastic: Fix LVS IPv6 address

https://gerrit.wikimedia.org/r/528215

debt claimed this task.

Change 724520 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/dns@master] wcqs: add discovery record

https://gerrit.wikimedia.org/r/724520