Monday, September 22, 2008

Ruby on Rails on MarkMail: 200,000 Emails

Interested in Ruby on Rails? If so, you'll be happy to learn we've loaded the full RoR mailing list archive. It holds about 200,000 emails and includes both the original Mailman lists from 2004-2006 and the GoogleGroups lists from 2006 onward.


Fun facts:

Don't forget, we have the regular Ruby lists too.

Thursday, September 11, 2008

FreeBSD, the Unknown Giant

My last entry about NetBeans and OpenOffice.org and their million messages reminded me that I've never announced here our load of the FreeBSD archives, an even larger and older community. They have more than 2.5 million messages, stretching back to 1994.

FreeBSD doesn't get as much attention at Linux but is a great operating system. Here's a description from an IBM developerWorks article:

"The FreeBSD operating system is the unknown giant among free operating systems. Starting out from the 386BSD project, it is an extremely fast UNIX®-like operating system mostly for the Intel® chip and its clones. In many ways, FreeBSD has always been the operating system that GNU/Linux®-based operating systems should have been. It runs on out-of-date Intel machines and 64-bit AMD chips, and it serves terabytes of files a day on some of the largest file servers on earth."
Here's the historic traffic chart (excluding automated bug and check-in messages):


Looks like it's a giant in traffic as well. The freebsd-questions list alone gets a couple thousand emails a month, half a million in its history. Got a FreeBSD problem? I bet the answer's in there.

Announcing NetBeans and OpenOffice.org

Last week we finished adding the NetBeans and OpenOffice.org mailing lists to the MarkMail archive. Both communities host more than a million messages each. Here's the NetBeans activity graph (with automated bugs and check-in messages removed):


Looks like they've seen a resurgence in activity going up for the last 4 years. They have more list activity than Eclipse, too. (Eclipse directs user questions to web forums that aren't included in our stats.)

Here's the OpenOffice.org traffic (same automated message removals):


The folks at CollabNet worked with us to transfer the massive archives, and yesterday we issued a joint press release announcing the new list availability. We also boasted passing 27.5 million emails in total. That was yesterday. Today we're passing 28 million. Chugga, chugga!

Tuesday, September 2, 2008

A Tale of Two Search Engines, Revisited

As Jason announced previously, last Wednesday night I delivered
A Tale of Two Search Engines — a presentation for the Software Architecture and Modeling SIG of SDForum — about building and running the Krugle and MarkMail vertical search engines for code and email, respectively.

Here are my tidied-up slides.

Note carefully that my presentation style is a very visual, story-telling approach for live, interactive audiences -- i.e., the slide deck is quite large and not geared towards a reading-at-home audience. Heck, I only broke down and used bullet points on 4 slides right at the end. :-)

That said, I'll start blogging some of the stories, go deeper on various technical details, and/or get into any of the "fun topics" that people are interested in. Feel free to leave comments here about any that you particularly want to hear about.

Special thanks to Ron Lichty for dragging me into giving this presentation and the wonderful SAMSIG audience for making it so much fun.

Enjoy,

John