Tuesday, April 29, 2008

Loaded NANOG: North American Network Operators' Group

Today we loaded a new list, NANOG, a discussion forum for the North American Network Operators' Group. In its 100,000 messages it holds some fascinating discussions about internet operations. The chatter around 9/11, Katrina, and y2k stand out especially.

The list extends back to April 1994, two months earlier than any list we previously loaded. It's always fun to break little records like that. It could be a while before we break this one again.

If you're intrigued by internet operations, the NANOG FAQ has lots of good factoids.

Tuesday, April 15, 2008

Loaded Python: A Cool Million Messages

Happy news: We've just finished loading the Python Software Foundation mailing lists. (Python is a popular programming language, overseen by the PSF.) With this load we're breaking a few records:

  • Weighing in at 1,022,479 total messages, Python is now the largest community ever loaded since our initial launch. (We went live back in November with roughly 4 million Apache messages.)
  • Half of those million mails are from a single list, python-list. That means python-list holds the new record for Crazy Huge What The Heck Can They Talk About So Much list. (And, would you believe, there's even more python-list histories from 1992-1995 still to load.)
  • This puts our total combined MarkMail message count above 10,000,000. There was much hooting and hollering (and page refreshing) around here as the numbers clicked over.
  • It's the biggest community ever loaded by our new hire Evan Paull. OK, it's the only community ever loaded by Evan. He started just a couple weeks ago. We figure after he's wrangled together a million message history, everything else will look easy.
Among the million mails are the archives for the Mailman project, something I'm especially happy about because much of our work here involves interfacing with Mailman, and this should help us understand it better.

As always, here's the traffic chart:

Monday, April 7, 2008

Loaded GNOME: 750,000 emails

Over the weekend we loaded the mailing list history for the GNOME project. GNOME is a immensely popular GNU project, a free software desktop environment and development framework. Their message traffic shows it has a vibrant and active community. They boast a history of 750,000 emails across more than 200 lists:

The peak in 2007? That's because in 2007 they started a new svn-commits-list (a list that captures emails about code check-ins) and it's been archived while the older cvs-commits-list wasn't. If we add -type:checkins to the query, we can graph the history without that list:

It took a fair amount of work to load the GNOME history because the archives had more spam and virus mails than could suitably be removed by hand. We had to use procmail and SpamAssassin to remove the junk.

One neat factoid: It's easier to remove spam from mail sent in 2004 than mail sent today. Spam blocking has always been a competitive arms race, but in this case we're fighting yesterday's war with today's technology! Even running in offline mode, SpamAssassin did a darn fine job.

I just wish it ran faster. If anyone out there is a SpamAssassin performance guru, please let us know.

Our thanks to Jeff Waugh for helping us get the histories.