Page MenuHomePhabricator

Support machine-readable output for mwscript-k8s
Closed, ResolvedPublic

Description

In T341553#10200997, T341553#10203241, and T341553#10208261, we discussed that cirrus-reindex-orchestrator, which currently shells out to mwscript, should instead shell out to mwscript-k8s, but it should be able to invoke a machine-readable output mode.

That might look like:

$ mwscript-k8s --output=json -- Version.php enwiki
{
  ...,
  "config": "/etc/kubernetes/mw-script-codfw.config",
  "job": "mw-script.codfw.abcde123",
  ...
}

With --output=json, the JSON on standard output contains all the information needed to interact with the Kubernetes API (either through the Python library or by invoking kubectl). Client software like the Cirrus reindexer can launch a job with mwscript-k8s, then follow up to get script output, exit code, or anything else.

We'll ensure that all the needed information is present in the output, and doesn't need to be inferred or constructed -- e.g. one could, but should not, construct the value of "config" by parsing the value of "job" in the example above. This way, Search team isn't affected when implementation details of the maintenance scripts' Helm config, like the format of the job name, change under the hood.

The --output=json mode is incompatible with --attach and --follow. Output from the maintenance script would interfere with reading the JSON from stdout. Plus, client software would want to reconnect in the event the underlying kubectl logs is interrupted by a network hiccup (for example) and so it'll manage fetching its own logs as appropriate.

Almost all of mwscript-k8s's own output (including progress updates like "Waiting for the container to start...") is currently on stderr, so it doesn't get in the way of the JSON on stdout and doesn't need to be affected by --output=json. There are a couple of exceptions, which I'll switch from stdout to stderr as part of this task. Note this is not the case when --verbose is passed, in which case we pipe through stdout from helmfile apply, so --verbose is incompatible with --output=json too.

We'll retronym the existing behavior as --output=none, because nothing is printed on stdout, and keep it as the default. That leaves us open for other flavors of output in the future.

Thanks to @Scott_French for talking through all this with me this afternoon.

Event Timeline

Change #1081265 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] deployment_server: mwscript-k8s logging cleanups

https://gerrit.wikimedia.org/r/1081265

Change #1081985 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] deployment_server: Refactor mwscript-k8s preparatory to adding --output

https://gerrit.wikimedia.org/r/1081985

Change #1081986 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] deployment_server: Add JSON output mode to mwscript-k8s

https://gerrit.wikimedia.org/r/1081986

Change #1081265 merged by RLazarus:

[operations/puppet@production] deployment_server: mwscript-k8s logging cleanups

https://gerrit.wikimedia.org/r/1081265

Change #1081985 merged by RLazarus:

[operations/puppet@production] deployment_server: Refactor mwscript-k8s preparatory to adding --output

https://gerrit.wikimedia.org/r/1081985

Change #1081986 merged by RLazarus:

[operations/puppet@production] deployment_server: Add JSON output mode to mwscript-k8s

https://gerrit.wikimedia.org/r/1081986

This is ready to use, and documented (including the JSON output format) at https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Shelling_out_to_mwscript-k8s. @EBernhardson please give this a try and let me know how it works for you! Happy to iterate as needed.

An example of waiting for a job to finish and then checking its status code could be something like:

import json
import subprocess
import sys

from kubernetes import client, config, watch


# Run mwscript-k8s and collect the job information.
p = subprocess.run(['/usr/local/bin/mwscript-k8s', '--comment=T377292', '--output=json', '--',
                    'Version.php', '--wiki=testwiki'],
                   capture_output=True)
result = json.loads(p.stdout)
if result['error']:
    print(f"mwscript-k8s failed: {result['error']}")
    sys.exit(1)
mwscript = result['mwscript']

# Init Kubernetes API client.
api_client = client.ApiClient(config.load_kube_config(config_file=mwscript['config']))
batch_client = client.BatchV1Api(api_client)
core_client = client.CoreV1Api(api_client)

# Wait for the job to finish (successfully or unsuccessfully).
w = watch.Watch()
for event in w.stream(batch_client.list_namespaced_job, mwscript['namespace'],
                      field_selector=f"metadata.name={mwscript['job']}"):
    job = event['object']
    if job.status.succeeded is not None or job.status.failed is not None:
        w.stop()

# Look up the pod.
pods_list = core_client.list_namespaced_pod(
    mwscript['namespace'], label_selector=f"job-name={mwscript['job']}")
[pod] = pods_list.items

# Select the MediaWiki container.
[container_status] = [c for c in pod.status.container_statuses
                      if c.name == mwscript['mediawiki_container']]
print(f'Maintenance script exited with status {container_status.state.terminated.exit_code}')

# Get the script output as a string.
logs = core_client.read_namespaced_pod_log(
    pod.metadata.name, mwscript['namespace'], container=mwscript['mediawiki_container'])
print(f'Maintenance script output was:\n{logs}')

(This is a sketch, and it's a little cavalier around Watch timeouts, list unpacking, and so on, for the purpose of showing the overall structure clearly -- but it does work. You might choose to be more cautious in production code.)

Optimistically resolving, but happy to reopen.