Monday, March 17, 2008

World Wide Web Consortium Lists: 400,000 emails

HTML 4.0, XML, PNG, CSS, DOM, and XQuery: These are but a few of the technologies to come out of the World Wide Web Consortium, commonly referred to as the W3C. We're proud to announce that MarkMail (which by the way uses all of those technologies!) has loaded the full W3C public mailing lists. They start in 1994 and cover 400,000 emails across 200 mailing lists.


With such a long and deep history it's fun to do a little archaeology: You can find the first mention of XML back in 1996. I tried to find the formal "XML 1.0" announcement and saw there wasn't one, but on launch day (February 10, 1998) you can find people complaining about rendering issues with the spec. Isn't that always the way with mailing lists? By the way, it's fun to use XML to search on the birth of XML.

Google first came up as a topic in August 1998, back when its domain ended "stanford.edu". That beats any other list by 5 months. The first mention of XQuery didn't come until January 2001, well after xml-dev and other lists were talking about it. I expect there's more chatter in the private W3C archives.

Finally, the first mention of MarkMail came in December 2007. And what a great post it was! :)

Thursday, March 13, 2008

Loaded Perl: 530,000 emails

Perl is the duct tape of the internet. Created by Larry Wall in 1987 and made famous with his Programming Perl "camel book" published by O'Reilly, it's the tool sysadmins use to keep things running.


We're proud to announce we've finished loading the Perl.org mailing list history into MarkMail. A total of 530,000 emails across 75 lists. The lists don't go back to 1987 (boy that'd be cool if they did). But that's all right; who really needs tech support against Perl 1.000?

What we have here is traffic starting with the migration to the Perl.org setup in 1999:


Enjoy! And if anyone has earlier archives, let us know.

Tuesday, March 11, 2008

New Search Feature: "opt:nostem"

In the science of Information Retrieval there's a constant tug of war between precision and recall. As Wikipedia defines the terms, precision is the fraction of the documents retrieved that are relevant to the user's information need, and recall is the fraction of the documents that are relevant to the query that are successfully retrieved. Or as I define the terms, precision is how much of what you wanted you actually got, and recall is how much of what you got is what you wanted.

MarkMail increases recall by running stemmed searches. This loosens the query constraint so that searching for proxies will match proxy as well. Sometimes this is good, and sometimes we hear from users who don't like the behavior all that much! They want more precision.

So we're happy to announce a new feature, opt:nostem, that when added to the search string turns off stemming for that query. You can try it for yourself:

http://markmail.org/search/?q=proxies
http://markmail.org/search/?q=proxies+opt%3Anostem

Friday, March 7, 2008

Average Load Time: 0.1 Seconds

There are many challenges in running a high-traffic web site. Performance is a challenge we particularly focus on at MarkMail because users get frustrated if they have to wait more than a second for a reply.

The challenge in maintaining performance increases as more of a site's content gets built dynamically -- meaning on the fly in response to user requests rather than ahead of time where it can be directly served (like a McDonalds hamburger).

With MarkMail we build every page dynamically using XQuery. Even a page that at first blush seems as if it could be pre-built, like an individual email message, we actually build dynamically because we want to highlight the search terms from your query.

All this is why I was so happy to notice that Alexa.com calls us a "Very Fast" site...

  • Markmail.org has a traffic rank of: 128,666 (UP 745,248)
  • Speed: Very Fast (99% of sites are slower), Avg Load Time: 0.1 Secs
Here's some background on how Alexa tracks performance.