Page MenuHomePhabricator

Optimization of conda-analytics deb package
Closed, DeclinedPublic

Description

The conda-analytics deb pkg is currently ~1GB.
Unpacked it occupies ~4GB.
Without the pkgs dir ~2GB.

The pkgs dir doubles the footprint but allows statboxes and launchers to clone the conda environment without touching the internet.

Those sizes are a problem because:

  • they occupy disk space on the cluster
  • they take bandwidth at each new version install from apt.wm.org.

Currently we have only 1 deb pkg, with pkgs dir included. And a postinst could be activated by a debconf variable to rm the pkgs dir, typically on workers.

we could create 2 deb pkgs:

  • 1 with the pkgs dir
  • 1 without

If we keep 1 deb pkg, some improvements could be made around the debconf variable, by reading the input true/false properly. And a change in the variable to false should trigger a re-install (to get pkgs).

Event Timeline

Thanks for the summary @Antoine_Quhen. Just one question.

Those sizes are a problem because:

  • they occupy disk space on the cluster
  • they take bandwidth at each new version install from apt.wm.org.

Are we sure that these two factors together consitute enough of a problem to warrant the work and overhead of creating/maintaining a second package?

I'd be inclined towards not doing it, although I appreciate that it's nice to be lean where we can. I'm just not sure that for 2GB of disk space per cluster member it's worth having to maintain two packages.

I'm happy to be convinced otherwise though.

Now, I think like you: it's not worth the time spent maintaining 2 packages.

I'm not even sure about putting some time into optimizing the debconf variable mechanism (the one removing the pkgs dir at install time). We could keep it as is for now. And add a comment here about var change:
https://github.com/wikimedia/puppet/blob/50c61b64f6323f8fd80733291b4b5a1225fb4d4c/modules/conda_analytics/manifests/init.pp#L40

I'm fine either way. I think I prefer two packages if we want to keep the worker installed size smaller, if we don't care, then let's just remove the debconf variable.