The [[ https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters#Goal | defined scope of the 'wikikube' cluster ]] explicitly excludes monitoring tools like Grafana, Kibana, etc.
This means that we need somewhere else to run the centralized parts of Jaeger (writing into storage, and the query + UI elements for retrieving trace data).
We discussed and rejected the idea of running these in a custom Docker / docker-compose setup on either a VM or a baremetal host. We decided it would be about as much work -- and much more reusable -- to simply turn up a small, new k8s cluster, which we decided to term 'aux'. (Another option that was considered was "ancillary", which was thought more expressive but much longer to type.)
The scope for this cluster, at least to start with, would be confined to just observability tools and other SRE-supported critical infrastructure services (for example, Netbox would be considered in-scope). We can broaden this later, but the intent is to avoid the cluster becoming a 'junk drawer'.
With `aux` as the cluster name prefix, we also decided upon `aux-k8s-etcd`, `aux-k8s-ctrl`, `aux-k8s-worker` as the machine name prefixes, similar to what Data Engineering did with `dse-k8s-`.
The initial plan is to run all of this on Ganeti, in just one of the core clusters, and to start with just a couple worker nodes.
So, on eqiad Ganeti, we need to turn up:
[ ] 3x `aux-k8s-etcd` nodes, 1G RAM, 1vcpu each
[ ] 2x `aux-k8s-ctrl` nodes, 1G RAM, 1vcpu each
[ ] 2x `aux-k8s-worker` nodes, 16G RAM, 8vcpu each