Our plan of record is to turn up all MediaWiki services that compose Pretrain in a single k8s namespace in the production Wikikube clusters.
MediaWiki services satisfying distinct use cases (e.g., web traffic vs. jobrunner) will correspond to distinct k8s Deployments within that namespace. A natural way to model that, given the shape of our existing MW-on-k8s services, is via helmfile releases.
Shape - While we are likely to expand this (e.g., to add canaries) in the near term, the minimum set of Deployments (releases) necessary to enable early testing will be "main" (which will initially serve all generic traffic workloads, including API traffic), "jobrunner" (which will include the RunSingleJob.php entrypoint, similar to mw-jobrunner), and "debug" (which will run debug images and apply additional debug-specific overrides, similar to mw-debug). Naming open to improvement.
Ingress - Similar to mw-wikifunctions (configuration), we will use k8s Ingress to direct test.wikipedia.org traffic to our respective Deployments (releases). What will differ is that not only will we target a specific Host, we'll also need to divert based on URL path (e.g., for jobrunner) or the value of the X-Wikimedia-Debug header (i.e., for debug).
Images - The initial turn up will use standard MediaWiki train images, rather than the daily wmf/next Pretrain image. Integration with code deployments via Scap should be relatively straightforward (see mw_releases in hiera, which is where we configure which images are deployed to which (namespace, release) tuples and at what stages of deployment).
Note that, aside from requests sent toward Ingress for the purposes of manual testing, these services will not be actively used until we begin to enable the routing changes described in T427666. Stated differently, this task covers the "Ingress and upstream services" portion of the effort, while T427666 covers actually getting testwiki traffic to Ingress.
Additional discussion
What about isolation? We will be introducing some form of isolation (e.g., isolated workers) to Pretrain services before they switch to daily wmf/next images. While we can certainly experiment with options to achieve this, we will defer this work initially to simplify the turn up (i.e., while still running exclusively train images).
What about "batch" workloads? (i.e., mw-script, mw-cron, mw-videoscaler) While we will need to support these kinds of workloads in Pretrain, our plan of record for the MVP is to support wmf/next image selection for only the mw-script use case that directly targets testwiki (i.e., no dblist support). All other use cases will continue to use train images initially.