Hi!
The Analytics team is looking for a new (more supported) GPU for stat1005. I have been discussing this with Rob and Faidon during the past days, creating a task to collect all the info and decide how to proceed.
Before starting - on stat1005 there is already a GPU, but it was bought (IIUC) as part of the host as pre-packaged solution from Dell. I have no idea if the card is integrated or attached to a PCI slot. We have been working in T148843 to see if the GPU could work with the recent AMD drivers but so far no luck, plus the GPU is not well supported anymore.
Ideally this should be a first test use case to start thinking a bit more about adding a GPU to all the computing nodes of Analytics (stat machines first, then possibly hadoop worker nodes in the bright future).
In T148843 @EBernhardson added some comments about GPUs that could be a good fit for us:
We'd need to find something that is compatible with this hardware list: https://rocm.github.io/hardware.html. Ideally the GFX 9 series would be the right target:
“Vega 10” chips, including the following GPUs: AMD Radeon RX Vega 56 AMD Radeon RX Vega 64 AMD Radeon Vega Frontier Edition AMD Radeon Pro WX 8200 AMD Radeon Pro WX 9100 AMD Radeon Pro V340 AMD Radeon Pro V340 MxGPU AMD Radeon Instinct MI25 Note that ROCm does not support the Radeon Pro SSG
Now the hard part. @RobH mentioned to me on IRC that we'd need to find a good compromise between space occupation within the chassis and extra power consumption (ideally a small card and no extra power consumption). Dell offers some GPU models that are compatible with their PowerEdge servers, but the list seems mentioning only models from the GFX7 series that is enabled but not supported by the RocM drivers (the same series of the GPU card on stat1005, that indeed seems not working with the last drivers):
https://dell.com/learn/us/en/04/campaigns/poweredge-gpu#campaignTabs-1
Let's try to find a good compromise :)