Page MenuHomePhabricator

Cleanup debconf handling in mailman puppet setup
Open, MediumPublic

Description

Earlier the day I tried to install the mailman security update, which triggered a debconf dialogue which warned about an existing installation. To avoid potentially overwriting an unpuppetised local configuration I selected the offered "abort" to investigate further, which quit the preinst. That didn't leave apparent changes to the installation.

Later on there was an Icinga alert for qrunner not running and I restarted mailman. When it disappeared later on again I noticed that the puppet run has triggered a reload of the mailman service, which lead to qrunner not running until I restarted mailman manually:

ESC[mNotice: /Stage[main]/Ssh::Client/File[/etc/ssh/ssh_known_hosts]/content: content changed '{md5}318b42d6506e93f9cb45adbd3d2e0e5b' to '{md5}1c4de80271ec0c284007948a7d37daab'ESC[0m
ESC[mNotice: /Stage[main]/Mailman::Listserve/Exec[dpkg-reconfigure mailman]: Triggered 'refresh' from 1 eventsESC[0m
ESC[mNotice: /Stage[main]/Mailman::Listserve/File[/etc/mailman/mm_cfg.py]/mode: mode changed '0644' to '0444'ESC[0m
ESC[0;32mInfo: /Stage[main]/Mailman::Listserve/File[/etc/mailman/mm_cfg.py]: Scheduling refresh of Service[mailman]ESC[0m
ESC[mNotice: /Stage[main]/Mailman::Listserve/Service[mailman]: Triggered 'refresh' from 1 eventsESC[0m
ESC[mNotice: Finished catalog run in 191.11 secondsESC[0m
ESC[0;32mInfo: Sleeping for 37 seconds (splay is enabled)ESC[0m
ESC[0;32mInfo: Retrieving pluginfactsESC[0m
ESC[0;32mInfo: Retrieving pluginESC[0m
ESC[0;32mInfo: Loading factsESC[0m
ESC[0;32mInfo: Caching catalog for fermium.wikimedia.orgESC[0m
ESC[0;32mInfo: Applying configuration version '1473251029'ESC[0m
ESC[mNotice: /Stage[main]/Mailman::Listserve/Debconf::Set[mailman/site_languages]/Exec[debconf-communicate set mailman/site_languages]/returns: executed successfullyESC[0m
ESC[0;32mInfo: Debconf::Set[mailman/site_languages]: Scheduling refresh of Exec[dpkg-reconfigure mailman]ESC[0m
ESC[mNotice: /Stage[main]/Ssh::Client/File[/etc/ssh/ssh_known_hosts]/content:

This is likely somehow triggered by the debconf setting mailman/site_languages, so I've stopped puppet on fermium until this is debugged further.

Event Timeline

It's a problem in the puppet module: It hardcodes a number of debconf choices, but e.g. Asturian is not available when running "dpkg-reconfigure mailman" manually. This possibly depends on the installed locales, need more investigation.

debconf::set { 'mailman/site_languages':

value  => 'ar, ast, ca, cs, da, de, en, es, et, eu, fi, fr, gl, he, hr, hu, ia, it, ja, ko, lt, nl, no, pl, pt, pt_BR, ro, ru, sk, sl, sr, sv, tr, uk, vi, zh_CN,\ zh_TW',
notify => Exec['dpkg-reconfigure mailman'],

}

Change 310746 had a related patch set uploaded (by Muehlenhoff):
Update list of mailman site languages

https://gerrit.wikimedia.org/r/310746

Change 310746 merged by Dzahn:
Update list of mailman site languages

https://gerrit.wikimedia.org/r/310746

I merged that and re-enabled puppet on fermium.

Notice: /Stage[main]/Mailman::Listserve/Debconf::Set[mailman/default_server_language]/Exec[debconf-communicate set mailman/default_server_language]/returns: executed successfully
Info: Debconf::Set[mailman/default_server_language]: Scheduling refresh of Exec[dpkg-reconfigure mailman]

it took a while at this step.. then continued

Info: /Stage[main]/Mailman::Listserve/File[/etc/mailman/mm_cfg.py]: Scheduling refresh of Service[mailman]
Notice: /Stage[main]/Mailman::Listserve/Service[mailman]: Triggered 'refresh' from 1 events

qrunner is running

list 20892 0.1 0.1 78608 16276 ? S 18:15 0:00 /usr/bin/python /var/lib/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
list 20893 0.0 0.2 79548 17256 ? S 18:15 0:00 /usr/bin/python /var/lib/mailman/bin/qrunner --runner=BounceRunner:0:1 -s
... etc..

ehhh... uhmm.. a little while later Icinga tells me:

PROCS CRITICAL: 0 processes with UID = 38 (list), regex args '/mailman/bin/qrunner'

PROCS CRITICAL: 0 processes with UID = 38 (list), regex args '/mailman/bin/mailmanctl'

:/

Mentioned in SAL (#wikimedia-operations) [2016-09-16T19:30:59Z] <mutante> fermium starting mailman qrunner (T144933)

Mentioned in SAL (#wikimedia-operations) [2016-09-16T19:30:59Z] <mutante> fermium starting mailman qrunner (T144933)

(disabled puppet again)

So no one looked into this in the last few days?

I am going to need to have puppet running for the puppetdb migration, so looking into this now.

The issue here is that debconf::set is very, very primitive.

The issue was that the order of languages wasn't in the same order in debconf and in puppet:

root@fermium:~# echo get mailman/site_languages | debconf-communicate 
0 sk, gl, fa, ast, ar, ca, cs, da, de, en, es, et, eu, fi, fr, he, hr, hu, ia, it, ja, ko, lt, nl, no, pl, pt, pt_BR, ro, ru, sl, sr, sv, tr, uk, vi, zh_CN, zh_TW

As a quick patch, I'm just reproducing this order in puppet, but debconf::set needs probably to be rewritten as a proper type/resource.

Change 311950 had a related patch set uploaded (by Giuseppe Lavagetto):
mailman::listserve: reproduce the debconf order from fermium

https://gerrit.wikimedia.org/r/311950

Change 311950 merged by Giuseppe Lavagetto:
mailman::listserve: reproduce the debconf order from fermium

https://gerrit.wikimedia.org/r/311950

Now puppet runs fine on fermium and doesn't stop/start qrunner at each iteration, but I'll leave the ticket open because this is in need of some serious reengineering.

MoritzMuehlenhoff renamed this task from puppet run stopping qrunner on fermium to Cleanup debconf handling in mailman puppet setup.Nov 11 2016, 12:46 PM
MoritzMuehlenhoff removed MoritzMuehlenhoff as the assignee of this task.
Volans triaged this task as Medium priority.Nov 23 2016, 9:01 AM