Elliott C. Back: In Aere Aedificare

A scientific restructuring of my email classification scheme

Posted in Computers & Technology, Spam by Elliott Back on August 27th, 2005.

I’m one of those freaks who uses a bayesian classifier to bucket my email into a variety of categories before consumption. Currently, I have filtered 26,797 emails into 17 buckets with 98.99% accuracy, counting from Mid-March of this year. A mere 16% of my email is spam, and the rest requires good filtering so that I can deal with it in a resonable period of time. So, what are the buckets I use for Bayesian filtering? Just mouseover the colored sections:

The buckets are amazon, app_dev, bank, blogging, cornell, cs490, elliottback-dot-com, family, friends, isso, lists, news, other, production_asst, spam, survey, and wordpress. So, first, what do you notice in common?

Things like {amazon, app_dev, cs490, production_asst, survey} belong to a class called “professional” while things like {elliottback-dot-com, blogging, and wordpress} should be in a class called “blogging.” {family, friends} are “personal,” while {isso, cornell} belong to a class called “education.” Banking probably stands alone, but the rest should be considered spam.

The point of this is that I’ve finally discovered that simplicity is far better than a complicated set of filterings…

This entry was posted on Saturday, August 27th, 2005 at 12:05 pm and is tagged with bayesian classifier, colored sections, amazon, classification scheme, mid march, buckets, family friends, cornell, freaks, blogging, period of time, consumption, restructuring, email, accuracy, education. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.

Your Thoughts Go Here:

Powered by WP Hashcash