Page MenuHomePhabricator

clean up mailman data directory (moderated messages > 0.5 million)
Closed, ResolvedPublic

Related Objects

Event Timeline

Dzahn claimed this task.
Dzahn raised the priority of this task from to High.
Dzahn updated the task description. (Show Details)
Dzahn added subscribers: ori, MZMcBride, Dzahn and 5 others.

/var/lib/mailman/data on sodium has an extreme number of files

root@sodium:/var/lib/mailman/data# find . | wc -l
585015

584975 of them are held messages (moderated messages that did not get posted or discarded and are waiting for moderation)

we should do something about that. it makes running our disable_list.sh script hang. running a find | grep on it takes over 2m30s.

also see: T83967 for a similar task in the past


08:53 <mutante> i'm afraid i can't paste that on phab
08:54 <mutante> it's 17MB plain text

provided the list to John , gzipped

08:56 <JohnFLewis> Following filippo's work; root@sodium:/var/lib/mailman/data# find . -type f -printf '%TY\n' | sort | uniq -c
08:56 <JohnFLewis> Can you do that? I'm interesting in the distribution of messages

root@sodium:/var/lib/mailman/data# find . -type f -printf '%TY\n' | sort | uniq -c
   9285 2007
  29034 2008
  43584 2009
  46963 2010
  21803 2011
  23940 2012
  33460 2013
 156506 2014
 220449 2015

talking with Robh we agreed it's best to delete all messages older than X and also we could find out which lists don't have active admins/mods by checking something like:

08:50 <robh> 100+ moderated, 1 real message a month = defunct list

lists with the highest number of held messages:

71729 ./heldmsg-wikiru
66819 ./heldmsg-wikinews
43642 ./heldmsg-maps
40495 ./heldmsg-wikimedia
38696 ./heldmsg-wikifa
37172 ./heldmsg-wiktionary
26928 ./heldmsg-wikimediabe
25267 ./heldmsg-wikimediake
25178 ./heldmsg-wikimedia
21802 ./heldmsg-libraries
21728 ./heldmsg-education
20654 ./heldmsg-wikisk
19982 ./heldmsg-wikifi
18612 ./heldmsg-wikiia
14318 ./heldmsg-wikilb
10470 ./heldmsg-wiktionarypt
10249 ./heldmsg-exyu
 8508 ./heldmsg-wikiskan
 5672 ./heldmsg-wikihe
 5018 ./heldmsg-wikimedianz
 4649 ./heldmsg-wikiquality
 4337 ./heldmsg-wikiversity
 2885 ./heldmsg-wikibooksde
 2381 ./heldmsg-wikimedia
 2238 ./heldmsg-juriwiki
 1918 ./heldmsg-mailman
 1821 ./heldmsg-wikimedia
 1705 ./heldmsg-wikipedia
 1554 ./heldmsg-wikimediafr
 1537 ./heldmsg-infobg
 1496 ./heldmsg-wikiar
 1353 ./heldmsg-wikimk
 1310 ./heldmsg-ruwikiconference
 1240 ./heldmsg-wikimediahk
 1168 ./heldmsg-wikimedia
 1167 ./heldmsg-wikiml
 1078 ./heldmsg-ca

cd /var/lib/mailman/data
find . > /tmp/data_dir_files
sort /tmp/data_dir_files > heldmsgs.sorted
cut -d "-" -f1,2 /tmp/heldmsgs.sorted | uniq -c | sort -nr

66557 ./heldmsg-wikinews-l
64079 ./heldmsg-wikiru-l
43642 ./heldmsg-maps-l
40208 ./heldmsg-wikimedia-in
38696 ./heldmsg-wikifa-l
37172 ./heldmsg-wiktionary-l
26928 ./heldmsg-wikimediabe-l
25178 ./heldmsg-wikimedia-in
20654 ./heldmsg-wikisk-l
19876 ./heldmsg-wikifi-l
18612 ./heldmsg-wikiia-l
14318 ./heldmsg-wikilb-l
10470 ./heldmsg-wiktionarypt-l
10249 ./heldmsg-exyu-tech
 8508 ./heldmsg-wikiskan-l
 7650 ./heldmsg-wikiru-a
 5672 ./heldmsg-wikihe-l
 5018 ./heldmsg-wikimedianz-l
 4649 ./heldmsg-wikiquality-l
 4337 ./heldmsg-wikiversity-l
 2885 ./heldmsg-wikibooksde-l
 2238 ./heldmsg-juriwiki-l
 1690 ./heldmsg-wikimedia-california
 1554 ./heldmsg-wikimediafr-l
 1537 ./heldmsg-infobg-l

starting to delete held messages that are older than 1 year.. starting with wikiru-l.. just super slooow

Change 233617 had a related patch set uploaded (by Dzahn):
mailman: add cronjob to delete old held messages

https://gerrit.wikimedia.org/r/233617

Change 233617 merged by Dzahn:
mailman: add cronjob to delete old held messages

https://gerrit.wikimedia.org/r/233617

sent a mail to the list of listadmins and announced a plan to delete everything automatically that is older than 90 days. 2 positive replies so far from Toby and Greg.

Dzahn added a subscriber: fgiunchedi.

down to 103051 and closing this for now