As currently envisioned, Toolhub's core will be a custom web application written in Python using the Django framework. It will need a database (probably MySQL/MariaDB based on current Wikimedia preferences) and a robust full text search engine (Elasticsearch). A job queue system of some sort would be useful for offloading tasks from the main request/response cycle (maybe Redis + Celery, maybe something lighter weight?). Web crawling to collect toolinfo.json records published external to Toolhub will need some scheduled job service (possibly just cron'd invocations of web endpoints exposed by the core application). Local development/testing environments should also be considered from the start of the project.
Based on @bd808's current assumptions, a full stack deployment of Toolhub is likely to need:
- Python >=3.7
- Numerous 3rd party Python libraries (Django, etc)
- MySQL/MariaDB database
- Memcached
- Redis
- Elasticsearch
- Task runner (Celery or similar)
Using Docker containers to deliver code to production would be nice. This would allow deploying the application on a Kubernetes cluster for production and Docker-compose or a local k8s cluster (minikube, k3s, etc) for development and testing. It should also let us avoid the deployment preparation challenges of using scap3 to deploy Python applications.
Designing the application with an expectation of running from a container under Kubernetes management opens up one more question: what Kubernetes cluster will be used for the production deployment? @bd808 believes there are 3 viable options today:
- Work with the ServiceOps team in SRE to bring Python support to the current production Kubernetes cluster. This will benefit other projects (ORES, Striker) in addition to Toolhub. It will not reduce the complexity of finding database, memcached, redis, elasticsearch, etc services to connect to in production.
- Toolforge! With the 2020 Kubernetes cluster in Toolforge it has become possible to grant expanded quotas to a single tool. This plus the existence of suitable database (toolsdb) and elasticsearch services in Toolforge makes it possible to imagine getting up and running in Toolforge relatively simply. Another benefit of this would be a light weight process for granting non-WMF/WMDE staff access to the tool to deploy and troubleshoot. Downsides: toolhub.toolforge.org hostname may make communicating that this service is scoped beyond Toolforge hosted tools more difficult; deployment automation in Toolforge is currently a per-tool challenge; log aggregation and monitoring solutions would need to be found.
- Cloud VPS. All of the needed infrastructure could be provisioned in a dedicated Cloud VPS project. This seems like the worst choice as everything would be locally maintained. This is great for flexibility on "day 1" but a maintenance burden for "day 2+".