Monday, April 7, 2008

Loaded GNOME: 750,000 emails

Over the weekend we loaded the mailing list history for the GNOME project. GNOME is a immensely popular GNU project, a free software desktop environment and development framework. Their message traffic shows it has a vibrant and active community. They boast a history of 750,000 emails across more than 200 lists:

The peak in 2007? That's because in 2007 they started a new svn-commits-list (a list that captures emails about code check-ins) and it's been archived while the older cvs-commits-list wasn't. If we add -type:checkins to the query, we can graph the history without that list:

It took a fair amount of work to load the GNOME history because the archives had more spam and virus mails than could suitably be removed by hand. We had to use procmail and SpamAssassin to remove the junk.

One neat factoid: It's easier to remove spam from mail sent in 2004 than mail sent today. Spam blocking has always been a competitive arms race, but in this case we're fighting yesterday's war with today's technology! Even running in offline mode, SpamAssassin did a darn fine job.

I just wish it ran faster. If anyone out there is a SpamAssassin performance guru, please let us know.

Our thanks to Jeff Waugh for helping us get the histories.

No comments: