Page MenuHomePhabricator

Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role
Closed, ResolvedPublic

Description

Cloud VPS Project Tested: n/a
Site/Location:EQIAD
Number of systems: 1
Service: role::analytics_test_cluster::coordinator::replica
Networking Requirements: internal IP, analytics vlan
Processor Requirements: 4
Memory: 32 GB (hive, presto, alluxio, oozie, mysql etc)
Disks: 200 GB
Other Requirements:

Event Timeline

I realize that this is a bit of a big VM at 32 GB, but I'm not sure that the required services will be able to run with much less.

The machine for which this is acting as a standby replica is a 128 GB physical machine (an-test-coord1001) and it's currently got 33 GB of that allocated.
I'm hoping that a 32 GB VM will give me enough RAM to run the required services and test the failover/failback capabilities.

I propose to make this in the row_B group of Ganeti. Five out of the six nodes in that group have enough free RAM to host this machine, so as long as neither the primary nor secondary instance is allocated to ganeti1013 then it should be OK.

ganeti1013.eqiad.wmnet   2.1T   1.6T  62.5G 35.2G  24.6G     6     2 row_B
ganeti1014.eqiad.wmnet   2.1T   1.8T  62.5G 21.1G  40.5G     5     2 row_B
ganeti1015.eqiad.wmnet   2.1T   1.7T  62.5G 23.0G  38.7G     5     2 row_B
ganeti1016.eqiad.wmnet   2.1T   1.9T  62.5G 13.0G  48.8G     2     6 row_B
ganeti1017.eqiad.wmnet   2.1T   1.8T  62.5G 10.4G  51.3G     3     4 row_B
ganeti1018.eqiad.wmnet   2.1T   1.7T  62.5G 10.2G  51.5G     3     5 row_B

Proceeding with this now.

btullis@cumin1001:~$ sudo cookbook sre.ganeti.makevm eqiad_B an-test-coord1002 --vcpus 4 --memory 32 --disk 200 --network analytics
Ready to create Ganeti VM an-test-coord1002.eqiad.wmnet in the ganeti01.svc.eqiad.wmnet cluster on row B with 4 vCPUs, 32GB of RAM, 200GB of disk in the analytics network.

ganeti1016 was allocated as the primary and ganeti1017 as the secondary, so this should be OK for RAM allocation.

MAC address for an-test-coord1002.eqiad.wmnet is: aa:00:00:0a:9f:0f

BTullis removed a project: SRE.
BTullis moved this task from Incoming (new tickets) to Ops Week on the Data-Engineering board.

Hey @BTullis, I run upon this ticket while on clinic duty. Apologies if you already did, but procedure calls for an SRE review before self serving (I know you are one, but I understand that as "gathering consensus from at least one more root"). Normally for small, trivial instances I would say to have an instant approval, but because what you comment at T289664#7307694 I think more scrutiny would be preferred, not to prevent the creation, but to check if extra hw resources would be needed on the ganeti cluster to prevent the failover issues you mention.

Also, because DRBD is used for filesystem replication, heavy io systems like the ones you comment ("hive, presto, alluxio, oozie, mysql") may not be a good fit for ganeti- and in some cases I know first hand they can lead to data corruption in case of a vm crash. Maybe bare metal could be preferred in this specific case, even if it is a test system?

With this I don't want to block your work, my objective here is to convince you that it would be desirable to add more voices to the ticket (working together :-)).

Hi @jcrespo - Sincere apologies. I hadn't meant to bypass the scrutiny, I simply hadn't spotted that in the procedure.
In this instance, I have actually created the virtual machine already, but I haven't booted it or installed it yet. Should I remove it with gnt-instance remove an-test-coord1002.eqiad.wmnet and then tidy up?

I agree that bare metal would be the best option for this server, because the machine for which it is intended to operate as a standby is bare-metal itself. That peer machine is: an-test-coord1001)

However, it seems to me that that the disk I/O is actually not actually all that high (one spike every 10 minutes) so I thought it would be OK. It's mainly the memory capacity that concerned me, given that an-test-coord1001 is a 128 GB server.

I'm not too worried about a delay to this work, so if the best answer is to delete the machine and put in a request for hardware instead, I'm more than happy to do that. Once again, apologies for inadvertently over-stepping the mark.

I said I didn't intend to block anything, and I mean it. Please continue with your work-specially if it involves puppet, it will not be lost. But I would like to invite the opinion of a few extra SREs to see the best way to move forward- eg. maybe procure extra analytics hardware instead, given that you also think it is the best way for the future or increase ganeti's capacity.

For example, abusing that @Dzahn is about to be asked about another vm creation review by another workmate, maybe he can weight on the issues you and I have brought up, in an unbiased way.

I could also help you procure new hardware instead, if that is a preferred alternative- let's wait and see what other people say :-).

Hey, @RobH, by any chance, is it possible that there could be available already some misc spec hw on eqiad with the minimum specs on description? Doesn't have to be anything super fancy, as it will be used for testing. Not asking a formal procurement yet, just to see if it is something out of the question, for the ongoing discussion whether ganeti or bare metal would be preferred.

Hey, @RobH, by any chance, is it possible that there could be available already some misc spec hw on eqiad with the minimum specs on description? Doesn't have to be anything super fancy, as it will be used for testing. Not asking a formal procurement yet, just to see if it is something out of the question, for the ongoing discussion whether ganeti or bare metal would be preferred.

Listing of eqiad servers with inventory status - only has 1

Overall we've reduced keeping spare systems, as they rarely suited the exact needs at any given time and have mainly been replaced by ganeti and standard hardware builds ordered for specific use-cases. So eqiad has its inventory/spare pool reduced to just one host at this time.

wmf5178

  • R440
  • (1) Intel Xeon Silver 4110 2.1G, 8C/16T, 9.6GT/s , 11M Cache, Turbo, HT (85W) DDR4-2400
  • (2) 32GB RDIMM 2666MT/s Dual Ran k
  • 1GB NIC
  • (2) 480GB SSD SATA Read Intensive 6Gbps 512e 2.5in Hot Plug S451 0 Drive, 1 DWPD

If this had to go to bare metal, it seems this single cpu spare system would meet the requirements.

Please note an 'inventory/spare' host doesn't mean no mgmt approval. Allocating that host requires a procurement request be filed and the allocation approved by DC Operations mgmt. Just on that request you can link to this task, and link to the above inventory system as a suggested solution.

My understanding is it is certainly easier to get a spare system use approved than going to buy a brand new system with the current fiscals budget though! (You would be requesting the use of something that was paid for already versus allocating budget in the current year.)

Please note an 'inventory/spare' host doesn't mean no mgmt approval

That was clear, hence the informal query only. You can unsubscribe now until/if we file a procurement task to avoid subsequent spam from the ongoing conversation.

Please let me know your thoughts on the viability that potential alternative @Dzahn @BTullis.

Thanks @jcrespo - that does seem to me like a very viable option for this use case; much more similar to the primary system (which is also an R440 ) than the VM that I had somewhat naively specified.

Interestingly, the two systems which fulfil these roles in the production hadoop cluster (as opposed to the test cluster) aren't identical to each other, but one of them is almost identical to wmf5178.

  • WMF7621 == an-coord1001 - identical CPU - 64 GB RAM - 240 GB disks
  • WMF7299 == an-coord1002 - earlier gen CPU with more cores - 64 GB RAM - 300 GB disks

So if anything, it's an-test-coord1001 that is over-specified for its current role, with 128 GB of RAM and 48 CPU threads.
Despite this discrepancy, I'm certain that the suggested server wmf5178 would be well suited to the role of an-test-coord1002.

I'm happy to file a procurement request for the suggested server, and/or take any other advice from anyone else.

As an aside, I have created a few other similar tickets recently, based on discussions with my team about making the test hadoop cluster more representative of the production cluster:

None of them is exactly urgent and I'll certainly seek more input from the SRE team next time about capacity and suitability, before making any changes.

Just to be clear, I am not saying "analytics cannot use ganeti". The thing is that when even yourself were saying "the fit wasn't great" we are happy to have a conversation and find alternatives like here. My personal thoughts is that ganeti in production, unlike cloud, is more of a niche product, mostly for very small projects. Maybe the solution is a dedicated vm cluster on analytics. Maybe k8s. Maybe something else? But I think expressing needs and gathering consensus can lead to a better infra overall :-)

Please formally request the procurement formally with a ticket following DC ops procedure, and please continue involving other SREs to decide either ganeti or something else would be suitable for your needs.

Another alternative that we may want to explore (that wasn't available until recently) is https://wikitech.wikimedia.org/wiki/Puppet/Pontoon, even if replicating the kerberos set up etc.. in there may be a little painful.

The Hadoop test cluster project started to support Kerberos (and later on Bigtop), since testing in labs/cloud at the time for something that complex (Hadoop + Kerberos + satellite systems) was very challenging. Pontoon seems way more friendly and easy to use, so it may be good to do a spike to see if it can be suitable to host the Analytics testing cluster. To unblock things, I think that the spare host that Rob listed earlier on would work nicely.

BTullis mentioned this in Unknown Object (Task).Aug 26 2021, 3:29 PM

I have formally requested a server for this requirement: {T289784}
As suggested, I have linked to this conversation in that ticket and mentioned the suggested server that is in the current inventory stock.

I still haven't booted the ganeti VM and I'm happy to wait for the outcome of that request before proceeding either to boot it or decommission it.
The related puppet CR is yet to be merged and the only change that I believe would have to be made if swapping to a physical server is the DHCP record:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/714753

Just to be clear, I am not saying "analytics cannot use ganeti". The thing is that when even yourself were saying "the fit wasn't great" we are happy to have a conversation and find alternatives like here.

Thanks, I appreciate that willingness to have the conversation about the best solution. That is why I'm happy to defer proceeding to boot and install the system until we reach a consensus about whether to roll forwards or backwards.

I suppose that I was considering a ganeti VM for this, mismatched in size as it was to its peer, as a relatively short term solution.
Something I could get set up fairly quickly myself, in order to test a new procedure before promoting that change in procedure to the production hadoop cluster.

Longer term, I suppose I had it my mind that the VM would likely be replaced and rebuilt as a physical machine at some point in the near future.
However, all of the longer term ideas above (i.e. a dedicated VM cluster, or K8S, or pontoon as a replacement for the test cluster, or something else...) are all great options too.

So I'm happy to stick with any of:

  • rolling out this 32 GB VM and then decommissioning it when we get a physical machine
  • waiting to see if we can get a physical machine from inventory, as discussed
  • waiting to see if we are advised to buy a machine instead
  • looking at another solution altogether...

I don't have much on the review process, besides: It used to be all Alex in the beginning and then we changed the process and I started reviewing / creating some. But that doesn't mean I should be the new SPOF. It's self service but also some kind of peer review doesn't hurt.

I can say though that 32GB is large and at the upper limit for what is possible or common for a ganeti VM.

I would also tend towards "this should be a physical machine".

Thank you Dzahn, I didn't intend you to make you the unofficial reviewer, I just saw you reviewing another one and I picked you at random from the SRE pool. It doesn't help that you are many times extremely reponsive and helpful. Thank you for your thoughts- a second opining makes me sure I wasn't biased or not seeing something.

The request (T289784) for a physical machine to serve as an-test-coord1002 has been approved, so I can now delete the ganeti VM (which was never booted) and clean up, ready for the install of the physical server.
I have a couple of questions.

  • Is it OK for me to delete and re-use the hostname an-test-coord1002 - given that it was never fully commissioned? I wasn't quite sure after reading this so I thought better to ask.
  • Should I run the decommisison cookbook against that hostname, despite its never got to the stage of running puppet?
  • Or is there a better option, such as?
    • removing the ganeti instance, then cleaning up netbox and DNS
    • booting the host to finish the installation, then running the decommisisoning cookbook

Once again, apologies for all the trouble.

cleaning up netbox and DNS

There is a script for that that is run by decom host, check if you can run it on its own- sadly I don't know all details- netops, dcops or tooling may know.

The request (T289784) for a physical machine to serve as an-test-coord1002 has been approved, so I can now delete the ganeti VM (which was never booted) and clean up, ready for the install of the physical server.
I have a couple of questions.

  • Is it OK for me to delete and re-use the hostname an-test-coord1002 - given that it was never fully commissioned? I wasn't quite sure after reading this so I thought better to ask.

Yep it is fine, as long as everything is cleaned up correctly first etc..

  • Should I run the decommisison cookbook against that hostname, despite its never got to the stage of running puppet?

I would personally try this road, in theory you should get a lot of warnings about things that cannot be delete etc.. (since they are not there, like the puppet client cert) but the cookbook should end up with the VM deleted and the DNS/Netbox status cleaned up.

If the decom cookbook stops for some reason it shouldn't be a problem, it can be run multiple times and it shouldn't fail (if we find some corner cases we can add code to the cookbook :)

I would personally try this road

+1

cookbooks.sre.hosts.decommission executed by btullis@cumin1001 for hosts: an-test-coord1002.eqiad.wmnet

  • an-test-coord1002.eqiad.wmnet (WARN)
    • Host not found on Icinga, unable to downtme it
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox

Cookbook completed successfully, without any notable errors. Thanks all.