Page MenuHomePhabricator

'wdumps' tool does not contain the expected www/python/src/app.py entrypoint script
Closed, ResolvedPublic

Description

Found while working on T246559: Investigate CrashLoopBackoff Pods on 2020 Kubernetes cluster.

This tool has an 'app' package rather than a script. The uwsgi configuration generator built into our webservice command explicitly looks for self.tool.get_homedir_subpath("www/python/src/app.py"). This behavior has existed as long as the webservice command. This makes me think that this tool has never actually worked on Toolforge.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2020-03-01T07:51:04Z] <wm-bot> <root> Stopped tool because it is missing the required www/python/src/app.py entry point. This is causing uwsgi to crash. (T246559, T246562)

The tool never used the Webservice command launcher. It starts its own uwsgi process using k8s. I will look into what's wrong. It definitely used to work.

The tool never used the Webservice command launcher. It starts its own uwsgi process using k8s. I will look into what's wrong. It definitely used to work.

This was one of the tools that I manually migrated from the legacy Kubernetes cluster to the 2020 Kubernetes cluster. I do not remember exactly, but I likely saw a collection of objects on the legacy cluster which looked like a webservice managed deployment but without $HOME/service.manifest state to go along with it. Due to various bugs in the webservice command over time this was not an entirely unique state to find things in. I migrated such tools by manually making a $HOME/service.manifest with the proper state and then running webservice migrate.

You are certainly welcome to use kubectl directly to deploy and manage your tool, but do be aware that this will also leave you with a higher percentage chance of broken or missing features as we continue to add things to the webservice managed configuration such as currently undocumented toolforge.org ingress objects.

Thanks, I cleaned up the k8s configs and it works fine again now. I might consider switching to the webservice tooling later, however I also like using raw k8s since it is easier to understand what is going on and modify things as necessary.

Thanks, I cleaned up the k8s configs and it works fine again now. I might consider switching to the webservice tooling later, however I also like using raw k8s since it is easier to understand what is going on and modify things as necessary.

A good example of what you are currently missing is the toolforge.org ingress object that webservice would have made for you. You have an ingress handling tools.wmflabs.org/wdumps, but no ingress for wdumps.toolforge.org. Your legacy URL ingress is also not including the same rewrite rules as webservice would create.

Treating Toolforge as a managed Kubernetes cluster rather than as a PaaS which happens to use Kubernetes is your prerogative. We have deliberately made direct Kubernetes access available in order to support the long tail of "power tool" use cases. Not using our default abstractions however is a risk for your tool's stability and features. The Toolforge admin team can not track and address each and every unique deployment. When we make sweeping changes (and there are more coming), tools which are not using webservice will very probably break unless their maintainers are actively following messages to cloud-announce@lists.wikimedia.org and proactively doing any announced early adopter/manual migration steps.

Deviating from the platform provided abstractions without a clearly advertised operations manual for your tool is also a risk for the larger Wikimedia community. Tools which are useful to the Wikimedia community very often outlive the interest of their initial developers. This is normal and expected. But when such tools are idiosyncratic and undocumented, it becomes very difficult for new volunteer maintainers to be found. Please do consider creating documentation at https://wikitech.wikimedia.org/wiki/Tool:Wdumps on how to at least restart and do basic troubleshooting for your tool.

As far as I can tell by casual observation, the only special thing about your tool is that it has not used the www/python/src/app.py convention for the Flask application entry point. I actually like the organization of the code and the use of a package, but I'm not sure that I see that this unique invention is worth the extra maintenance burden of requiring a custom Deployment. This is me thinking from the point of view of a Toolforge maintainer who has spent the last 3+ years trying to get folks to document their work so that we don't have more disruptive incidents like the ones used as illustrations of what can go wrong in https://wikitech.wikimedia.org/wiki/User:BryanDavis/Developing_community_norms_for_critical_bots_and_tools.

Your legacy URL ingress is also not including the same rewrite rules as webservice would create.

Are these rewrite rules documented somewhere? This was one of the troubles I had getting started with using webservice directly, in that I have no idea how ingress works and what is rewritten (is the /$TOOL name removed from the URL before passing the request to my app or is it not?). The only reference I could find to this was https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Other_/_generic_web_servers, but that is at the end of a very long article and also mixes old grid engine and kubernetes parts.

I also found almost no references to $TOOL.toolforge.org in the wikitech docs.

I'm going to try and migrate to the webservice tooling in the future. But right now, the documentation doesn't really explain what webservice is doing beyond a collection of examples. The only way to figure out what is going on is to look at the source of webservice and dump the k8s configs after it has done its job.

Your legacy URL ingress is also not including the same rewrite rules as webservice would create.

Are these rewrite rules documented somewhere?

Not on wikitech, no. The current implementation for the 2020 Kubernetes cluster ingress came from T242719: https://tools.wmflabs.org/{toolname} no longer redirects to https://tools.wmflabs.org/{toolname}/ on new k8s cluster where the lack of it was reported as a regression from the past behavior of the 404 handler layer on the legacy Kubernetes and Grid Engine ingress layer.