Gender in Blogging: Counting the Men and the Women
Continuing my partnership with the great blog directory Blogwise, I analysed generic first name data on 24,846 different blogs. My goal was to determine how many men versus how many women are blogging, and what the most popular blogger names are for each gender are. To do this, I compared the raw data against US Census first names data, which lists 1000 male and female names, ranked by frequency. If there is a tie between these lists for a given name, I choose the one with a frequency twice as much as the other, or else ignore it. This rarely occurs. Then, there are plenty of other first names that are harder to classify. Those fall into the backup lists, which are simply long lists of male / female names I’ve trolled off the internet. Finally, if it can’t match that, it gives up. This brings me to the following “raw numbers” result:
male: 14548
female: 4390
neither: 5908
That’s right. Out of the 18,938 names I could identify, 76.8% of them are male and only 23.2% female:

Both men and women’s name form a typically power law distribution, where the few most popular names account for most of all names. Note that these are log-y graphs:


And finally, the 10 most popular men’s names:
JOHN
DAVID
MICHAEL
CHRIS
MARK
PAUL
MIKE
JAMES
JASON
RICHARD
And the 10 most popular women-blogger’s names:
JENNIFER
LAURA
SARAH
MICHELLE
LISA
ROBIN
MARY
AMY
HEATHER
LINDA
If this makes you feel lonely, women, it should. 50% of america is female. 50% of bloggers of bloggers should be female. I won’t even *start* to hand out blame or speculate why–I’ll leave that for the comments–but it looks like there’s work to be done. For more information about blogging and naming, see What do you call your blog.
The Latest Bugs and Naming from our “friends” at Redmond…
CNET is carrying two Microsoft stories today. The first is about unpatched image vulnerabilities in IE SP2. According to the release, four proof of concepts have been released that crash the latest version of internet explorer. The bug release, on security focus, reads:
Microsoft Internet Explorer is prone to a buffer overflow vulnerability in the JPEG image rendering library used by the browser. This issue is due to a failure of the application to properly bounds check input data prior to copying it to a fixed size memory buffer.
This issue was identified by creating random input for the browser, and has not been researched further at this time. This BID will be updated as further information is disclosed.
Successful exploitation may result in execution of arbitrary code in the context of the user executing the affected browser.
As such, it may or may not be exploitable, but it certainly is a bug. The second is advance speculation on the true name of Longhorn:
Rumor has it that Microsoft plans to use Vista as the official name for the next version of Windows, which has been known by its codename, Longhorn.
Personally, I think Windows Vista sounds a bit odd…
A few thoughts about perl
I started using perl the other day, and I noticed a few things about it:
1) Perl is really not typesafe, and really doesn’t even have types proper. In fact, the reason why you can’t use true/false is because in theory someone might mean true to be 47 and false to be 46. In reality, this is just a coverup to hide the fact that perl has no typesystem whatsoever.
2) Perl is a memory hog. Loading 120,000 small data values into hashes took up 200mb of ram. I’m not the first to notice this.
3) Perl syntax is ugly. $ for variables, % for hashes, @ for arrays, differing semantic context for # to get array length? It’s a big mess. Then there’s the regex syntax, where $var =~ a/regex/replace/ magically replaces things.
4) Perl functions are hacks. sub x {} doesn’t even define a parameter list, but just puts them all into a magic variable you can “shift” to get them out. What? Seriously! This is messy as anything, and means that the compiler can’t do certain kinds of optimizational analysis.
5) Why are the variables all global by default? Didn’t the inventors of perl hear about something called “default scope,” which in every other sane language is local?
However, there are lots of cool “glue” applications you can do with it, including bayesian chess: www.lbreyer.com/spam_chess.html . Also note that I don’t hate perl–it’s my second day using it.