Page MenuHomePhabricator

New tool to track package updates/status for hosts and images (debmonitor)
Open, MediumPublic

Description

Riccardo and myself have been working on a design to replace servermon/packages which also keeps track of packages using in images used by containers:

Description

Design and implement a tool that allows to keep track of all installed and upgradable packages across the fleet, both for normal OSes and container images, replacing and improving the similar functionality currently available in Servermon, now deprecated. The additional complexity of the container images is that an image can be based on top of another image, as a tree, and we should track only the packages installed directly in any given image.

Proposed name: debmonitor

General features

  • Add/remove hosts/images
    • The list of hosts is provided by the "source of truth" of the infrastructure, and when a host is added/removed this should be reflected in this tool with an API call. The reason to have the list of hosts from an external authoritative source is to be able to ensure that no host stops reporting it's pacakges and alarm if it happens.
    • Same goes for the list of container images and their versions/tags, they should be added/removed automatically when an image is generated/removed from the catalog
  • Update package list for a host/image
    • Hosts:
      1. On the hosts there will be an apt hook that will send the changed package:version (installed/removed) and the upgradable packages to a specific endpoint
      2. A crontab will run periodically to grab the whole list of package:version and send only the differences from the last time it run checking with a local cache. The ingestion triggered by Cron has two purposes:
        • Provide the initial data for a freshly installed/reimaged hosts
        • Catch package changes which were invisible to the apt hook (installing software via dpkg -i or running apt with an option to not execute apt hooks) Both the ingestion based on the apt hook and the cron-based ingestion operate on the same cache of installed packages. On a cleanly updated system, the cron job should usually not ingest packages apart from the initial run We'll need some locking to prevent parallel runs for the ingestion of the "full package set".
    • Images:
      1. When generating a new image:version/tag a hook should send the list of packages installed directly in this image, not those installed in any parent image
      2. Periodically an instance of each active image should be instantiated, an apt-get update should be performed and with the apt hook the list of upgradable packages should be sent to the endpoint to update the upgradable packages in this specific image.
  • Web/API interface to consume the data
    • Search and list source packages, binary packages, hosts, images
    • List upgradable packages per host, per image, globally (with the count of hosts/images affected) with version_from and version_to, showing whether is a security upgrade
    • (long term) allow to mark an upgrade path 'approved' and have an API that returns the list of approved upgrades that will be automatically perfomed by debdeploy (or an equivalent tool) across the fleet in a controlled and progressive way.
    • The tool should link to changelogs as currently done by servermon
  • An optional feature would be to also store the running kernel, which would allow queries of systems needing a reboot (the installed kernel package is not necesarily the running one) (Can be adde to a later revision, low severity)
  • The tool will be specific to the Debian package format, it doesn't seem useful to extend to RPM: We use only OSes based on apt and deb packages, so the overengineering to make it more general and support multiple logic is not only not worth, but it would also not be tested and maintained properly. Also there's some subtle differences in version semantics.
  • A CLI tool would be useful mid-term, but can be added in a followup step

Web URIs / API

This is a generic draft of the specific Web/API interfaces need for the usual workflow. Any/all of the model's CRUD's API (GET/POST/PUT/DELETE for each endpoint) could be exposed either by default, either on a case-by-case basis, as we see fit.
Likely tool choice is Django, since already used for servermon and other services. There is also a django-closuretree package that could be used to manage the images tree more easily.

This tool could run on Ganeti instances, potentially multi-DC (failover setup)

GET    /hosts: list of hosts with the number of packages installed
POST   /hosts: add a new host
DELETE /hosts: delete an host
GET    /hosts/{host}: list all package:version installed on host (and whether they are upgradable and whether is a security upgrade ? )
GET    /hosts/{host}/updates: list of upgradable packages in this host with version_from->version_to and whether is a security upgrade
POST   /hosts/{host}/update: accept a JSON that describes the changes for this host, installed/removed/upgradable packages. See below for the format.

GET    /images: list all images
POST   /images: add a new image
GET    /images/{image}: list all tag/versions of the image with the number of packages installed
DELETE /images/{image}: delete this image and all versions
GET    /images/{image}/updates: list of upgradable packages version_from->version_to in this image (latest image version)
GET    /images/{image}/{tag/version}: list all installed packages directly in this image version
POST   /images/{image}/{tag/version}: add a new image version
DELETE /images/{image}/{tag/version}: delete this image version
POST   /images/{image}/{tag/version}/update: accept a JSON that describes the changes for this host,installed/upgradable packages. See below for the format.
                                           The list of installed packages can be sent only once and no removed packages will be accepted.
GET    /images/{image}/{tag/version}/parents: tree of parents of this image
GET    /images/{image}/{tag/version}/childrens: tree of the childrens of this image

GET    /packages: list all packages
GET    /packages/{package}: package info including all versions and number of hosts and image versions for each package version
GET    /packages/{package}/hosts: list of hosts for each package version
GET    /packages/{package}/images: list of image versions for each package version
GET    /packages/{package}/{version}/hosts: list all hosts with package:version installed
GET    /packages/{package}/{version}/images: list all image:version/tag with package:version installed

GET    /source-packages: list all source packages
GET    /source-packages/{package}: package info including all available versions and the number of binary packages generated from it
GET    /source-packages/{package}/{version}: version info including the list of all binary packages generated from it

DB Structure

The DB will actually be defined by the models in the framework/ORM used, but in general it should
be something like the one shown here below. All tables that have an id will have a primary key on the id. Fore some tables (like bridge tables)
the id is actually optional and having it or not will depends on the requirements of the framework/ORM used.

The db could be hosted in one of the misc MySQL shards.

Here the pseudo-schema, additional common fields like created/modified are not listed here:

source_packages
    id
    name
    unique index on name

source_package_versions
    id
    version
    os
    source_package_id references source_packages.id
    unique index on source_package_id+version+os

packages (binary packages)
    id
    name
    unique index on name

package_versions (binary packages)
    id
    version
    os
    upgrade_type (optional, it's implementation has not yet been decided: string, ENUM, separate table)
    description (optional, to store information related to this specific version, potentially CVE or a URL)
    package_id references packages.id
    source_package_version_id references source_package_versions.id
    unique index on package_id+version+os

hosts
    id
    hostname (FQDN)
    running_kernel references package_versions.id
    last_seen (datetime, updated every time the host report it's status)
    unique index on hostname

images
    id
    name
    unique index on name

image_versions
    id
    tag/version
    last_seen (datetime, updated every time the image version report it's status)
    image_id references images.id
    unique index on image_id+tag/version

image_version_tree (a closure table to represent the tree with arbitrary levels)
    id (optional)
    parent references image_versions.id
    children references image_versions.id
    depth
    unique index (parent, children)

hosts_packages
    id (optional)
    host_id references hosts.id
    package_version_id references package_versions.id
    unique index (host_id, package_version_id)

hosts_upgrades
    id (optional)
    host_id references hosts.id
    package_version_from references package_versions.id (optional, it can be retrieved from hosts_packages)
    package_version_to references package_versions.id
    unique index (host_id, package_version_to)

image_version_packages
    id (optional)
    image_version_id references image_versions.id
    package_version_id references package_versions.id
    unique index (image_id, package_version_id)

image_version_upgrades
    id (optional)
    image_id references images.id
    package_version_from references package_versions.id (optional, it can be retrieved from image_version_packages)
    package_version_to references package_versions.id
    unique index (image_id, package_version_to)

Update endpoint format

The apt hook and crontab should use the same format to report the package changes for a host/image. Here a JSON draft proposed format:

{
    "installed": [{"name": "foo", "version": "1", "source": "foobar"}, ...],
    "uninstalled": [{"name": "bar", "version": "1", "source": "foobar"}, ...],
    "upgradable": [{"name": "baz", "version_from": "1.0.0", "version_to": "1.0.1", "source": "foobar"}, ...],
}

Details

Related Gerrit Patches:
operations/software/debmonitor : masterLogging: avoid duplicate logging
operations/software/debmonitor : masterSet CSP header for all views
operations/software/debmonitor : masterFix table ordering on click
operations/software/debmonitor : masterPackages details page: don't wrap badges
operations/software/debmonitor : masterFrontend: specify items shown per page
operations/software/debmonitor : masterHost detail: fix package sorting order
operations/software/debmonitor : masterClient CLI: bump version to 1.2.0
operations/software/debmonitor : masterClient CLI: add CA bundle for server validation
operations/software/debmonitor : masterBumped django-auth-ldap to v1.6.1
operations/software/debmonitor : masterRun CLI tests also with Python 2.7
operations/software/debmonitor : masterCreate a custom mysql backend and use it
operations/software/debmonitor : masterDocumentation: remove example setting
operations/software/debmonitor : masterClient self-update capability
operations/software/debmonitor : masterExclude all tests from distributed packages
operations/software/debmonitor : masterAdd server side validation of client certificates
operations/software/debmonitor : masterAdd login and LDAP support
operations/software/debmonitor : masterAdd basic test coverage
operations/software/debmonitor : masterAdd CLI script to be installed in the target hosts
operations/software/debmonitor : masterFirst working version
operations/software/debmonitor : masterCreated Django apps
operations/software/debmonitor : masterCreated Django project
integration/config : masterAdd tox job for operations/software/debmonitor

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 9 2017, 12:42 PM
Volans added a subscriber: Volans.Jun 9 2017, 1:16 PM
Volans updated the task description. (Show Details)Jun 9 2017, 1:21 PM
Volans added subscribers: Joe, faidon.Jun 9 2017, 1:34 PM
akosiaris added a subscriber: akosiaris.EditedJun 12 2017, 2:46 PM

Note there's T167269 that describes an approach that at least partly (if not fully) overlaps.

@akosiaris yes we were aware of it and I spoke with @Joe last week about the requirements for the Docker part, sorry to not have mentioned/referenced it here too. The idea is to have a single tool at this point that can work for both physical hosts and Docker images, so it should overlap fully with the requirements of T167269.

Do you see any missing feature / do you have any feedback on it?

We should also investigate other available tools in the container space, for example one recently released is https://github.com/puppetlabs/lumogon or from CoreOS https://github.com/coreos/clair (thanks @Joe for this one). Disclaimer: I've not yet done an extensive search for other available tools ;)

From what I've seen so far:

  • Lumogon: without the usage of the Puppetlabs endpoint, the client alone outputs a JSON with the representation of the image that contains installed packages (apk/dpkg/yum), modified files and some basic facts. It would be interesting to see if it could be easily expanded to get also list of other packaging systems like npm, pip, etc. If this case an option could be to run it on image creation and add an endpoint to debmonitor to be able to parse this information and store it. For the available updates we could use the same apt-hook described above.
  • Clair: try to solve the security upgrade side of the problem, downloading CVEs from different sources and indexing into a database the images installed packages, allowing to check if there are security upgrades available. It has it's own database (Postgres) and seems to handle only security updates and not normal updates. It seems to me that it might be harder to integrate it with debmonitor as it is designed right now.

As always any feedback is welcome

fgiunchedi triaged this task as Medium priority.Jul 21 2017, 10:18 AM
Volans moved this task from Backlog to In Progress on the SRE-tools board.Oct 4 2017, 9:58 AM

Change 394618 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Created Django project

https://gerrit.wikimedia.org/r/394618

Change 394619 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Created Django apps

https://gerrit.wikimedia.org/r/394619

Change 394620 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] First working version

https://gerrit.wikimedia.org/r/394620

Change 394621 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Add basic test coverage

https://gerrit.wikimedia.org/r/394621

Change 394990 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Add CLI script to be installed in the target hosts

https://gerrit.wikimedia.org/r/394990

Change 395479 had a related patch set uploaded (by Volans; owner: Volans):
[integration/config@master] Add tox job for operations/software/debmonitor

https://gerrit.wikimedia.org/r/395479

Change 395479 merged by jenkins-bot:
[integration/config@master] Add tox job for operations/software/debmonitor

https://gerrit.wikimedia.org/r/395479

Change 394618 merged by Volans:
[operations/software/debmonitor@master] Created Django project

https://gerrit.wikimedia.org/r/394618

Volans moved this task from In Progress to Backlog on the SRE-tools board.Feb 21 2018, 10:03 AM
Volans moved this task from Backlog to In Progress on the SRE-tools board.Apr 4 2018, 10:34 AM
Volans claimed this task.Apr 9 2018, 4:50 PM

Change 425417 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Add login and LDAP support

https://gerrit.wikimedia.org/r/425417

Change 428302 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Add server side validation of client certificates

https://gerrit.wikimedia.org/r/428302

Change 394619 merged by Volans:
[operations/software/debmonitor@master] Created Django apps

https://gerrit.wikimedia.org/r/394619

Change 394620 merged by Volans:
[operations/software/debmonitor@master] First working version

https://gerrit.wikimedia.org/r/394620

Change 394990 merged by Volans:
[operations/software/debmonitor@master] Add CLI script to be installed in the target hosts

https://gerrit.wikimedia.org/r/394990

Change 394621 merged by jenkins-bot:
[operations/software/debmonitor@master] Add basic test coverage

https://gerrit.wikimedia.org/r/394621

Change 425417 merged by jenkins-bot:
[operations/software/debmonitor@master] Add login and LDAP support

https://gerrit.wikimedia.org/r/425417

Change 428302 merged by jenkins-bot:
[operations/software/debmonitor@master] Add server side validation of client certificates

https://gerrit.wikimedia.org/r/428302

Change 432394 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Client self-update capability

https://gerrit.wikimedia.org/r/432394

Change 432585 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Exclude all tests from distributed packages

https://gerrit.wikimedia.org/r/432585

Change 432585 merged by jenkins-bot:
[operations/software/debmonitor@master] Exclude all tests from distributed packages

https://gerrit.wikimedia.org/r/432585

Change 432394 merged by jenkins-bot:
[operations/software/debmonitor@master] Client self-update capability

https://gerrit.wikimedia.org/r/432394

Change 436592 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Create a custom mysql backend and use it

https://gerrit.wikimedia.org/r/436592

Change 436737 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Documentation: remove example setting

https://gerrit.wikimedia.org/r/436737

Change 436592 merged by jenkins-bot:
[operations/software/debmonitor@master] Create a custom mysql backend and use it

https://gerrit.wikimedia.org/r/436592

Change 436737 merged by jenkins-bot:
[operations/software/debmonitor@master] Documentation: remove example setting

https://gerrit.wikimedia.org/r/436737

Change 437955 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Run CLI tests also with Python 2.7

https://gerrit.wikimedia.org/r/437955

Change 437956 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Bumped django-auth-ldap to v1.6.1

https://gerrit.wikimedia.org/r/437956

Change 437958 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Allow to specify a CA bundle for server validation

https://gerrit.wikimedia.org/r/437958

Volans moved this task from In Progress to In Code Review on the SRE-tools board.Jun 8 2018, 8:08 PM
hashar removed a subscriber: hashar.Jun 9 2018, 5:52 AM

Change 439560 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Client CLI: bump version to 1.2.0

https://gerrit.wikimedia.org/r/439560

Change 437955 merged by Volans:
[operations/software/debmonitor@master] Run CLI tests also with Python 2.7

https://gerrit.wikimedia.org/r/437955

Change 437956 merged by Volans:
[operations/software/debmonitor@master] Bumped django-auth-ldap to v1.6.1

https://gerrit.wikimedia.org/r/437956

Change 437958 merged by Volans:
[operations/software/debmonitor@master] Client CLI: add CA bundle for server validation

https://gerrit.wikimedia.org/r/437958

Change 439560 merged by Volans:
[operations/software/debmonitor@master] Client CLI: bump version to 1.2.0

https://gerrit.wikimedia.org/r/439560

Change 440312 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Host detail: fix package sorting order

https://gerrit.wikimedia.org/r/440312

Change 440326 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Frontend: specify items shown per page

https://gerrit.wikimedia.org/r/440326

Change 440312 merged by Volans:
[operations/software/debmonitor@master] Host detail: fix package sorting order

https://gerrit.wikimedia.org/r/440312

Change 440326 merged by Volans:
[operations/software/debmonitor@master] Frontend: specify items shown per page

https://gerrit.wikimedia.org/r/440326

Change 440657 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Packages details page: don't wrap badges

https://gerrit.wikimedia.org/r/440657

Change 440657 merged by Volans:
[operations/software/debmonitor@master] Packages details page: don't wrap badges

https://gerrit.wikimedia.org/r/440657

Change 442042 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Fix table ordering on click

https://gerrit.wikimedia.org/r/442042

Change 442042 merged by Volans:
[operations/software/debmonitor@master] Fix table ordering on click

https://gerrit.wikimedia.org/r/442042

Change 442110 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Set CSP header for all views

https://gerrit.wikimedia.org/r/442110

Change 442214 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/debmonitor@master] Logging: avoid duplicate logging

https://gerrit.wikimedia.org/r/442214

Change 442110 merged by Volans:
[operations/software/debmonitor@master] Set CSP header for all views

https://gerrit.wikimedia.org/r/442110

Change 442214 merged by Volans:
[operations/software/debmonitor@master] Logging: avoid duplicate logging

https://gerrit.wikimedia.org/r/442214

Volans moved this task from In Code Review to Backlog on the SRE-tools board.Jun 28 2018, 4:12 PM

The service and client are in production and working fine. Leaving the task open for the Docker images part.

Bstorm added a subscriber: Bstorm.Oct 3 2018, 8:35 PM