Page MenuHomePhabricator

Stop archiving the wikidata-bugs mailinglist in pipermail
Closed, ResolvedPublic

Description

The wikidata-bugs list allows people to easily subscribe to all Wikidata related bugs.

The pipermail archives currently have 202k emails archived or about 20% of emails archived in all of Wikimedia's list archives. We don't/didn't archive wikibugs-l and given that all of the emails are just copies of what's already in Phabricator, maintaining a separate archive seems unnecessary. We can also rely on other external archivers like gmane/etc. if people want HTML email archives.

I propose that the list continues to operate as is, but we stop archiving new emails and consider deleting the old archives.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

At some point, the external archives of wikibugs-l were the only way to search or count certain events in our issue tracker, which were impossible to query in the web interface. Are we sure nobody is using, say, the mbox archive for such purposes?

Outsch. I didn't realize it's that much...
From my side it's fine no to archive this one. I don't think anyone should rely on the archive. Phabricator search exists.

Deleting the archive should also be fine.

Any list administrator can disable archiving new messages, at https://lists.wikimedia.org/mailman/admin/wikidata-bugs/archive -- just change "Archive messages?" to "no" and submit.

If you'd also like to delete the existing archive, SRE can do that. For the record, we're not talking about very much data at all:

rzl@lists1001:~$ sudo du -hs /var/lib/mailman/archives/private/wikidata-bugs{,.mbox}
3.8G	/var/lib/mailman/archives/private/wikidata-bugs
1.3G	/var/lib/mailman/archives/private/wikidata-bugs.mbox

so from purely a storage perspective it's not worth worrying about -- but I don't know if there are any other considerations in terms of mailman's performance. CC @Dzahn to weigh in if he's so inclined.

I can only agree with what RLazarus already said. This is just a config change at the list-admin level. The people receiving wikidata-bugs-owner@lists.wikimedia.org can just do this as they see fit. You don't need Operations/SRE for that. Deletion of archives is normally only done if there are legal reasons for it and the server is not about to run out of disk anytime soon.

What about pywikibugs-l? I assume it's also pretty big and people can just search in phabricator instead.

I can only agree with what RLazarus already said. This is just a config change at the list-admin level. The people receiving wikidata-bugs-owner@lists.wikimedia.org can just do this as they see fit. You don't need Operations/SRE for that. Deletion of archives is normally only done if there are legal reasons for it and the server is not about to run out of disk anytime soon.

True but if we migrate to mailman3 (which I hope we do sometime soon), it's going to be on its own database (T256538) and it would probably have backup (which is a good thing, the current setup doesn't have backups AFAIK) so I think having the archives tidied up would help us a lot. Maybe we can also get an overview of the biggest archives to handle the low-hanging fruits first? Is it possible?

What about pywikibugs-l? I assume it's also pretty big and people can just search in phabricator instead.

No such list pywikibugs-l
No such list pywikibugs

if we migrate to mailman3

I can't say anything about that one way or another.

going to be on its own database

I see the concerns of DBAs about the size on that linked ticket. ACK. Since discussion already started there let's keep possible efforts to globally reduce size on that ticket or a dedicated task i'd say. Looks like Herron already answered the part about the total size it currently has.

If there is a migration and it turns out we actually have to reduce the size .. i guess we could just skip copying/migrating some selected archives instead of actively deleting stuff on the old instance now.

What about pywikibugs-l? I assume it's also pretty big and people can just search in phabricator instead.

No such list pywikibugs-l
No such list pywikibugs

Sorry, I should double checked name of the mailing list (there are lots of pywikibot related mailing lists, it gets really confusing, sorry). There are two: pywikibot-bugs and pywikipedia-bugs

it would probably have backup (which is a good thing, the current setup doesn't have backups AFAIK)

We do have backups of mailman archives in the current setup and have had them since a long time. list servers include "profile::backup::host and the lists role tells it to backup "backup::set { 'var-lib-mailman': }". Whether specific databases are backed up should be verified with DBA team.

Maybe we can also get an overview of the biggest archives to handle the low-hanging fruits first? Is it possible?

Yes, here are the ones that are larger than 1G , by size:

10G	helpdesk-l
6.1G	reading-web-team.mbox
6.1G	arbcom-l
5.5G	wmfall
4.8G	reading-web-team
3.8G	wikidata-bugs
3.1G	wmfall.mbox
3.0G	qa-alerts
2.9G	wmfsf
2.4G	arbcom-l.mbox
2.1G	wikimedia-in-exec
2.0G	ops
2.0G	blog-admin
1.9G	wikimedia-l
1.8G	wmfsf.mbox
1.7G	wikinews-pl
1.7G	pressemeldungen
1.6G	wikimediail-board
1.6G	affcom
1.5G	wikimedia-il
1.5G	wikien-l
1.4G	wmfcc-l
1.3G	wikiit-admins-l
1.3G	wikidata-bugs.mbox
1.3G	functionaries-en
1.2G	clerks-l
1.1G	wikitech-l
1.1G	wikinews-pl.mbox
1.1G	wikimediaua
1.1G	wikimedia-in-exec.mbox
1.1G	analytics-internal

If you wonder about some being .mbox and some not: Each list has 2 things, an mbox with all-in-one and the directory with HTML generated from the mbox files. So it's possible that all HTML has been deleted but the mbox is still around or the other way around. I don't know which one would matter for an import into a database.

There are two: pywikibot-bugs and pywikipedia-bugs

787M pywikibot-bugs
260M pywikibot-bugs.mbox

160M pywikipedia-bugs
56M pywikipedia-bugs.mbox

Those sizes seem negligible to me.

assigning to Lydia as she is the list admin and can change it per T262773#6464825

I think that's all that is needed to resolve this ticket. the general discussion about archive sizes in relation to a possible migration of the list server can continue on the linked DBA-task and shouldn't be limited to just one list.

It's not 5GB, it's 10GB (T278609#6970491) which is 6% of all of our archives. I'm inclined to delete the archives and will do it if there's no objections by end of the next week.

Mentioned in SAL (#wikimedia-operations) [2021-04-15T05:44:39Z] <Amir1> start deleting archive of wikidata-bugs T262773

Mentioned in SAL (#wikimedia-operations) [2021-04-15T05:54:27Z] <Amir1> end of cleaning archive of pywikibot-bugs and wikidata-bugs T262773