This is also maybe (or maybe not) related to the NFS outage; it started crashing in earnest after I started rebooting nfs k8s worker nodes.
@Slst2020 is bailing me out now, but I was very confused by the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Jobs_framework that don't really explain where/how the api is run. I'm not clear on if those docs are just really out of date or if I'm badly misreading. For instance 'jobs-framework-api (code) --- uses flask-restful and runs inside the k8s cluster as a webservice' sent me to the 'jobs' tool but that seems to be unrelated?