The maps-test cluster is running on old hardware that needs to be replaced (it's reached it's end of life). We have 2 options:
- replace the hardware, re-image the servers, keep things as they are in regards to map data and configuration
- destroy the maps-test cluster, create a brand new test cluster on our cloud infrastructure
In more details:
replace hardware in current cluster
- This is fairly easy and has a hardware cost attached to it, but not much cost in terms of the limited engineering resources of the Maps team.
- Having a maps test cluster using real hardware and in the production zone is a unicorn; as almost all other applications run tests on cloud (wmflabs) infrastructure.
Note: the hardware cost for updating the test cluster has already been budgeted for.
create a new test cluster on cloud (wmflabs)
- Almost all of our applications have test environments on cloud, which allows for more experimentation, is isolated from production, and is using less physical resources.
- The current maximum disk size we can get on cloud does not allow to run the full OSM dataset and provides lower performances than dedicated hardware.
additional information:
- Lower performances is not really an issue for us, as the test cluster will see much less traffic than production (obviously).
- We will not be able to run performance tests representative of production, but this isn't an issue for any other application.
- We deploy small / incremental enough changes that we should be able to spot issues fast enough on production.
- Not having a full dataset means that we need to work with different OSM dumps, which might expose slightly different behaviours.
- We won't be able to test map styles as effectively as previously, if we move the map test cluster to cloud—as different regions of the globe expose different mapping characteristics.
- Moving to cloud probably requires some changes (application code, puppet, ...), but should not be too hard. My (@Gehel) time is limited, but maybe @Pnorman can help.
- Moving to cloud could encounter additional unforeseen issues that we don't have engineering resources to 'fix'