Last week I spoke at the inaugural Mark Logic User Group meeting in Reston, VA (near where a lot of our government customers are based). The topic was MarkMail: where the idea came from, how we built it on the cheap, how Mark Logic began using it internally, and some lessons we learned as we scaled out the public high-traffic site. It's a similar talk to the one I gave at the Mark Logic User Conference in San Francisco last month.
For those interested, the slides are available as a downloadable MOV file. Click to advance.
The slides are fairly simple. Most of the fun of the talk (well, at least for me) is in the stories I tell, usually relating to the quotes in italics at the bottom of slides. I suppose you'll just have to use your imagination.
Tuesday, June 23, 2009
MarkMail at the first MarkLogic User Group
Posted by
Jason Hunter
at
2:09 PM
1 comments
Labels: markmail mlug
Tuesday, May 5, 2009
MarkMail at the Mark Logic User Conference
The Mark Logic User Conference is coming up next week. If you're coming to the show, I encourage you to attend the talk on MarkMail I'll be giving on Wednesday. I'll tell the story of MarkMail as it progressed from my first idea to a night project built with Ryan Grimm to the robust web site you see now at markmail.org (and even to the other web sites you don't see, because they're running behind people's firewalls). It's in the conference's technical track so there'll be a lot of focus on the core tech.
If you're not coming to the show, why the heck not? :) It's not too late to register.
Posted by
Jason Hunter
at
5:53 PM
2
comments
Labels: markmail conference
Tuesday, December 9, 2008
MarkMail at One Year: Looking Back
It's now been a little over a year since we launched MarkMail. We've sure come a long way!
We're now seeing well over a million unique visitors every month and more than 5 million page views.
The Googlebot crawler (whose activity isn't included in the above statistics) has also been active. It now crawls between 1.0 and 1.3 million pages every day to keep its index fresh. That's about 15 page hits every second -- or 15 Hertz, enough to make a nice low background rumble noise. It's really enjoyable to get so much Google attention because it wasn't that long ago when we were just trying to get Google to index more than a million of our pages, nevermind crawl that many in a day.
Our content size has grown also, from 4 million messages at launch, covering just the Apache Software Foundation archives, to 34 million messages today, spanning all sorts of communities. For us to grow so big so fast has been possible only because of the community support we've received. There's a long list of various community members who have worked with us to accumulate and load their list archives. We'd like to thank all those folks, as well as the people who placed a MarkMail search box or other MarkMail link on their site or helped spread the word in blogs and emails and tweets.
Looking forward, where do we go from here? We have some big plans. I'll get into details with a later post.
Posted by
Jason Hunter
at
4:27 PM
4
comments
Labels: markmail anniversary
Thursday, October 9, 2008
Google Code Adds Gadgets: MarkMail Helps
Google today announced new support for embeddable "gadgets" on Google Code project pages. Particularly exciting to us, they introduced MarkMail as the recommended gadget for viewing and searching Google Code project list archives.
For those who haven't encountered one in the wild, a Google Gadget is an embeddable web object that puts a bit of third-party dynamic content into the middle of a web page. Gadgets are the things you place on your iGoogle home page or your Google Desktop, but you can also add them to your own web page with one line of JavaScript, or anyone else's page if it supports the OpenSocial APIs.
We've coordinated with the Google Code team over the last several months to load about 500 GoogleGroups lists (3.8 million emails) and build a new MarkMail Gadget (launching today!) to let Google Code developers search and analyze their lists using MarkMail.
The new MarkMail gadget lets you view messages, threads, attachments, and senders, and a traffic chart (wouldn't be MarkMail without it!) for any set of messages you want to follow. The messages you choose to track with the gadget can be those from a single list, set of lists, a person, containing a term or phrase, or any combination. In fact, anything you can use in a search on MarkMail can be used as input to the gadget view. The new gadget offers two features not yet available on MarkMail.org: a daily traffic chart (MarkMail.org only does monthly traffic charts) and a view that coalesces threads.
So what does this mean for you? If you're a project leader (either on Google Code or somewhere else) it's now easier than ever to embed a MarkMail traffic chart and recent message list inside any of your project pages. If you're just a lurker, you can personalize your view on MarkMail traffic and embed that view into iGoogle or Google Desktop, or any other page.
To help you set up the right links, we created a Gadget Embedding Wizard that guides you through the process of embedding. You can also find our gadget in the Google Directory where they have additional embedding instructions.
Tim O'Reilly in describing Web 2.0 says, A platform beats an application every time. We agree. We think you should be able to access mailing list archives whenever and wherever you want, be it at MarkMail.org or on another page that's been MarkMail-enabled via a gadget. So have fun, and let us know how this works for you!
Posted by
Jason Hunter
at
12:14 AM
1 comments
Labels: gadget, google, googlegroups
Thursday, October 2, 2008
1.4% of Emails Mention Google
As Google celebrates its 10 year anniversary we thought it'd be fun to use our archive of 30 million mailing list messages to see how Google's popularity has grown over time across the list-o-sphere. Boy has it grown!
In 2008 (so far) the word "Google" appears in 1.4% of emails in our archive, up from 1.15% last year and 0.75% five years ago.
While shockingly high, that 1.4% number is actually calculated with some conservative restrictions. We're excluding all mentions that occur inside quote blocks (where someone replies to another who said the word). It'd be 2% if we didn't have that rule. We're also excluding from our calculations all the Google Groups lists we follow, where Google is often the topic of discussion. With those lists added in? It's 13%.
You can explore this yourself with our public interface. You'll want to query for "google", use the "opt:noquote" flag, and set "-list:googlegroups" to exclude those lists. Then you can add date constraints either by typing "date:2008" in the search or dragging on the chart. Track the numbers as a result of your searches, do a little division, and you get your percentages.
You'll see that so far in 2008 there were 50,826 emails saying "google" across 3,607,973 emails total. That's 1.4%. For 2003 it's 21,165 emails out of 2,770,480 total, or 0.75%.
Posted by
Jason Hunter
at
6:07 PM
0
comments
Monday, September 22, 2008
Ruby on Rails on MarkMail: 200,000 Emails
Interested in Ruby on Rails? If so, you'll be happy to learn we've loaded the full RoR mailing list archive. It holds about 200,000 emails and includes both the original Mailman lists from 2004-2006 and the GoogleGroups lists from 2006 onward.
Fun facts:
- Frederick Cheung is the #1 most frequent poster
- DHH is #22
- The traffic never fully recovered after it transitioned from rubyonrails.org to GoogleGroups. You can compare the two charts (keep an eye on the y-axis).
- Maybe it's because DHH didn't make the move to GG?
Posted by
Jason Hunter
at
3:46 PM
0
comments
Labels: markmail rubyonrails ruby
Thursday, September 11, 2008
FreeBSD, the Unknown Giant
My last entry about NetBeans and OpenOffice.org and their million messages reminded me that I've never announced here our load of the FreeBSD archives, an even larger and older community. They have more than 2.5 million messages, stretching back to 1994.
FreeBSD doesn't get as much attention at Linux but is a great operating system. Here's a description from an IBM developerWorks article:
"The FreeBSD operating system is the unknown giant among free operating systems. Starting out from the 386BSD project, it is an extremely fast UNIX®-like operating system mostly for the Intel® chip and its clones. In many ways, FreeBSD has always been the operating system that GNU/Linux®-based operating systems should have been. It runs on out-of-date Intel machines and 64-bit AMD chips, and it serves terabytes of files a day on some of the largest file servers on earth."Here's the historic traffic chart (excluding automated bug and check-in messages):

Looks like it's a giant in traffic as well. The freebsd-questions list alone gets a couple thousand emails a month, half a million in its history. Got a FreeBSD problem? I bet the answer's in there.
Posted by
Jason Hunter
at
4:17 PM
0
comments
Labels: freebsd, list loading
