Riccardo and myself have been working on a design to replace servermon/packages which also keeps track of packages using in images used by containers:
Description
Design and implement a tool that allows to keep track of all installed and upgradable packages across the fleet, both for normal OSes and container images, replacing and improving the similar functionality currently available in Servermon, now deprecated. The additional complexity of the container images is that an image can be based on top of another image, as a tree, and we should track only the packages installed directly in any given image.
Proposed name: debmonitor
General features
- Add/remove hosts/images
- The list of hosts is provided by the "source of truth" of the infrastructure, and when a host is added/removed this should be reflected in this tool with an API call. The reason to have the list of hosts from an external authoritative source is to be able to ensure that no host stops reporting it's pacakges and alarm if it happens.
- Same goes for the list of container images and their versions/tags, they should be added/removed automatically when an image is generated/removed from the catalog
- Update package list for a host/image
- Hosts:
- On the hosts there will be an apt hook that will send the changed package:version (installed/removed) and the upgradable packages to a specific endpoint
- A crontab will run periodically to grab the whole list of package:version and send only the differences from the last time it run checking with a local cache. The ingestion triggered by Cron has two purposes:
- Provide the initial data for a freshly installed/reimaged hosts
- Catch package changes which were invisible to the apt hook (installing software via dpkg -i or running apt with an option to not execute apt hooks) Both the ingestion based on the apt hook and the cron-based ingestion operate on the same cache of installed packages. On a cleanly updated system, the cron job should usually not ingest packages apart from the initial run We'll need some locking to prevent parallel runs for the ingestion of the "full package set".
- Images:
- When generating a new image:version/tag a hook should send the list of packages installed directly in this image, not those installed in any parent image
- Periodically an instance of each active image should be instantiated, an apt-get update should be performed and with the apt hook the list of upgradable packages should be sent to the endpoint to update the upgradable packages in this specific image.
- Hosts:
- Web/API interface to consume the data
- Search and list source packages, binary packages, hosts, images
- List upgradable packages per host, per image, globally (with the count of hosts/images affected) with version_from and version_to, showing whether is a security upgrade
- (long term) allow to mark an upgrade path 'approved' and have an API that returns the list of approved upgrades that will be automatically perfomed by debdeploy (or an equivalent tool) across the fleet in a controlled and progressive way.
- The tool should link to changelogs as currently done by servermon
- An optional feature would be to also store the running kernel, which would allow queries of systems needing a reboot (the installed kernel package is not necesarily the running one) (Can be adde to a later revision, low severity)
- The tool will be specific to the Debian package format, it doesn't seem useful to extend to RPM: We use only OSes based on apt and deb packages, so the overengineering to make it more general and support multiple logic is not only not worth, but it would also not be tested and maintained properly. Also there's some subtle differences in version semantics.
- A CLI tool would be useful mid-term, but can be added in a followup step
Web URIs / API
This is a generic draft of the specific Web/API interfaces need for the usual workflow. Any/all of the model's CRUD's API (GET/POST/PUT/DELETE for each endpoint) could be exposed either by default, either on a case-by-case basis, as we see fit.
Likely tool choice is Django, since already used for servermon and other services. There is also a django-closuretree package that could be used to manage the images tree more easily.
This tool could run on Ganeti instances, potentially multi-DC (failover setup)
GET /hosts: list of hosts with the number of packages installed
POST /hosts: add a new host
DELETE /hosts: delete an host
GET /hosts/{host}: list all package:version installed on host (and whether they are upgradable and whether is a security upgrade ? )
GET /hosts/{host}/updates: list of upgradable packages in this host with version_from->version_to and whether is a security upgrade
POST /hosts/{host}/update: accept a JSON that describes the changes for this host, installed/removed/upgradable packages. See below for the format.
GET /images: list all images
POST /images: add a new image
GET /images/{image}: list all tag/versions of the image with the number of packages installed
DELETE /images/{image}: delete this image and all versions
GET /images/{image}/updates: list of upgradable packages version_from->version_to in this image (latest image version)
GET /images/{image}/{tag/version}: list all installed packages directly in this image version
POST /images/{image}/{tag/version}: add a new image version
DELETE /images/{image}/{tag/version}: delete this image version
POST /images/{image}/{tag/version}/update: accept a JSON that describes the changes for this host,installed/upgradable packages. See below for the format.
The list of installed packages can be sent only once and no removed packages will be accepted.
GET /images/{image}/{tag/version}/parents: tree of parents of this image
GET /images/{image}/{tag/version}/childrens: tree of the childrens of this image
GET /packages: list all packages
GET /packages/{package}: package info including all versions and number of hosts and image versions for each package version
GET /packages/{package}/hosts: list of hosts for each package version
GET /packages/{package}/images: list of image versions for each package version
GET /packages/{package}/{version}/hosts: list all hosts with package:version installed
GET /packages/{package}/{version}/images: list all image:version/tag with package:version installed
GET /source-packages: list all source packages
GET /source-packages/{package}: package info including all available versions and the number of binary packages generated from it
GET /source-packages/{package}/{version}: version info including the list of all binary packages generated from itDB Structure
The DB will actually be defined by the models in the framework/ORM used, but in general it should
be something like the one shown here below. All tables that have an id will have a primary key on the id. Fore some tables (like bridge tables)
the id is actually optional and having it or not will depends on the requirements of the framework/ORM used.
The db could be hosted in one of the misc MySQL shards.
Here the pseudo-schema, additional common fields like created/modified are not listed here:
source_packages
id
name
unique index on name
source_package_versions
id
version
os
source_package_id references source_packages.id
unique index on source_package_id+version+os
packages (binary packages)
id
name
unique index on name
package_versions (binary packages)
id
version
os
upgrade_type (optional, it's implementation has not yet been decided: string, ENUM, separate table)
description (optional, to store information related to this specific version, potentially CVE or a URL)
package_id references packages.id
source_package_version_id references source_package_versions.id
unique index on package_id+version+os
hosts
id
hostname (FQDN)
running_kernel references package_versions.id
last_seen (datetime, updated every time the host report it's status)
unique index on hostname
images
id
name
unique index on name
image_versions
id
tag/version
last_seen (datetime, updated every time the image version report it's status)
image_id references images.id
unique index on image_id+tag/version
image_version_tree (a closure table to represent the tree with arbitrary levels)
id (optional)
parent references image_versions.id
children references image_versions.id
depth
unique index (parent, children)
hosts_packages
id (optional)
host_id references hosts.id
package_version_id references package_versions.id
unique index (host_id, package_version_id)
hosts_upgrades
id (optional)
host_id references hosts.id
package_version_from references package_versions.id (optional, it can be retrieved from hosts_packages)
package_version_to references package_versions.id
unique index (host_id, package_version_to)
image_version_packages
id (optional)
image_version_id references image_versions.id
package_version_id references package_versions.id
unique index (image_id, package_version_id)
image_version_upgrades
id (optional)
image_id references images.id
package_version_from references package_versions.id (optional, it can be retrieved from image_version_packages)
package_version_to references package_versions.id
unique index (image_id, package_version_to)Update endpoint format
The apt hook and crontab should use the same format to report the package changes for a host/image. Here a JSON draft proposed format:
{ "installed": [{"name": "foo", "version": "1", "source": "foobar"}, ...], "uninstalled": [{"name": "bar", "version": "1", "source": "foobar"}, ...], "upgradable": [{"name": "baz", "version_from": "1.0.0", "version_to": "1.0.1", "source": "foobar"}, ...], }