Page MenuHomePhabricator

Ganeti request: two new VMs in eqiad for ldap
Closed, ResolvedPublic

Description

This ticket is for Ganeti requisition and VM creation. Each host should have 8 CPUs and /at least/ 4Gb of RAM but 8Gb would be better.

They're going behind LVS, so should have internal IPs.

Event Timeline

Andrew created this task.

Alex, I'm hoping that you have time to do some/all of this. If not, then please answer these questions and refer back to me so I can muddle through:

  1. Do we have capacity to make these? And, if so, is it OK if I use 8Gb of RAM per?
  2. What should I name them, and what IP range should they land in?

Thanks!

Alex, I'm hoping that you have time to do some/all of this. If not, then please answer these questions and refer back to me so I can muddle through:

Depends on the urgency, if it's not urgent I can do it indeed, otherwise I am not the best bet for this.

  1. Do we have capacity to make these? And, if so, is it OK if I use 8Gb of RAM per?

Yes. But why go for 8GB of RAM? Per https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=4&fullscreen&orgId=1&var-server=seaborgium&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=now-30d&to=now hosts barely reach 3GB of RAM and that's because of a memory leak that we don't know if it will be present in those instances as well.

  1. What should I name them, and what IP range should they land in?
  • ldap-labs-replica01.eqiad.wikimedia.org
  • ldap-labs-replica02.eqiad.wikimedia.org

I guess public IP range if they are to be reachable by VMs. So public1-a-eqiad and public1-c-eqiad respectively, that is 208.80.154.0/26 and 208.80.154.64/26 respectively.

Thanks!

  1. What should I name them, and what IP range should they land in?
  • ldap-labs-replica01.eqiad.wikimedia.org
  • ldap-labs-replica02.eqiad.wikimedia.org

May I suggest we avoid introducing new FQDNs with the labs keyword if possible. Perhaps cloud can fit there, or wmcs or whatever.

Ldap isn't really a cloud-specific service. So I propose ldap-replica01 and ldap-replica02. My instinct is to put them on private IPs (because LVM will provide public endpoint anyway) but I'm open to suggestion.

Change 496222 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] Allocate IPs for ldap replicas

https://gerrit.wikimedia.org/r/496222

Change 496222 merged by Andrew Bogott:
[operations/dns@master] Allocate IPs for ldap replicas

https://gerrit.wikimedia.org/r/496222

  1. Do we have capacity to make these? And, if so, is it OK if I use 8Gb of RAM per?

Yes. But why go for 8GB of RAM? Per https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=4&fullscreen&orgId=1&var-server=seaborgium&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=now-30d&to=now hosts barely reach 3GB of RAM and that's because of a memory leak that we don't know if it will be present in those instances as well.

More memory could mean the interval between restarts is increased. The servers aren't using more than 3GB of RAM because of the artificial limit the restarts are imposing. Not saying it's an ideal situation, just being realistic about us not getting a fix very soon for that. If we can spare an extra 4GB, throwing hardware at the problem could help a little bit.

Oh, I was confused -- these do need to be on public IPs.

These are now created and puppetized:

ldap-eqiad-replica01.wikimedia.org
ldap-eqiad-replica02.wikimedia.org

  1. Do we have capacity to make these? And, if so, is it OK if I use 8Gb of RAM per?

Yes. But why go for 8GB of RAM? Per https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=4&fullscreen&orgId=1&var-server=seaborgium&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=now-30d&to=now hosts barely reach 3GB of RAM and that's because of a memory leak that we don't know if it will be present in those instances as well.

More memory could mean the interval between restarts is increased. The servers aren't using more than 3GB of RAM because of the artificial limit the restarts are imposing. Not saying it's an ideal situation, just being realistic about us not getting a fix very soon for that. If we can spare an extra 4GB, throwing hardware at the problem could help a little bit.

Sure, more memory may mean we will be able to cause fewer disconnections to client, but we don't even know if the bug is going to be there in the new instances.