Elliott C. Back: In Aere Aedificare

Public Blogger: Kottke

Posted in How to Blog by Elliott Back on March 31st, 2005.

Now that Kottke.org has become a publicly funded enterprise, he has a responsibility to “edit/write/design/code the site for one year on a full-time basis.” This means that his posts should be frequent, long, and of high quality since he started blogging full time on 2/22/2005. However, a graph of post length over time shows a disturbing fact:

Kottke.org Posts over Time since Going Public

The length and quantity of posts are incredibly low in the month since he went public! On average, every day he writes about 300 words, not counting the numerous links that go on the site. Is 300 words a full time job? Not really. Take a look at this telling histogram:

Histogram of <a href=Kotte.org Posts” />

More than 50% off Kottke.org’s new full time consists of posting virtually nothing! To put this in precise terms, we can construct a 95% confidence interval for the average length of Kottke’s posts. Since n=37 samples, we have approximately a standard normal distribution, which implies the following large-sample confidence interval:

Large Scale CI

Plugging in the data since he went public, you get the following interval:

95% CI: (69, 322)

In plain terms, we can say with 95% probability that Kottke’s average post-lengths fall in this interval. Or, in other words, it certainly doesn’t look like he’s working on www.kottke.org as a full time job.

Update:

It’s not that I don’t like www.kottke.org as a blog. I was just surprised to see little change about the site when he “went public” and received funding to work on it full time. This is a quantitative measure of exactly how much posting he’s done since then, which bears out my impressions. It’s a great blog, and I’m sure he’s a great guy, but, the posting is sparse!

This entry was posted on Thursday, March 31st, 2005 at 1:52 am and is tagged with full time job, standard normal distribution, confidence interval, quantitative measure, precise terms, histogram, time basis, blogger, impressions, probability, graph, high quality, blog. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.

26 Responses to 'Public Blogger: Kottke'

  1. Elliott Bäck said:

    on March 31st, 2005 at 3:06 am

    For example, in the last hour I’ve posted 426 words, exceeding Kottke’s full-time day. Are his posts 24 times more insightful than mine?

  2. theMike said:

    on March 31st, 2005 at 12:17 pm

    When I first heard that he would be quitting his job and blogging full time, I thought…what will he blog about? A good deal of his good posts before were about experiences that he had while at his other job. Oh well.

  3. Scam City version 8.0 said:

    on March 31st, 2005 at 2:43 pm

    The Decline of Jason Kottke
    Yes, I was a very vocal non-supporter of Jason Kottke’s “microparton” pledge drive so it gave me a great sense of satisfaction when Elliott Back shows how Jason’s output has been on the downslope in the month since he began blogging full-time……

  4. Michael said:

    on March 31st, 2005 at 2:43 pm

    I didn’t like Kottke’s blog “before” and didn’t come to like it “after”, but your arithmetics are too unhuman. Remember Blaise Pascal with his “I’m writing this long because I didn’t have time to make it short”.

  5. […] oof… In continuance of the Jason Kottke saga*, today we find Elliot Back’s analysis of just how lazy the dude has become. The “now-a-full-time blogger” is actuall […]

  6. Shaghaghi.net said:

    on March 31st, 2005 at 6:08 pm

    Analysis of Kottke
    When Kottke decided to blog full-time and asked for support to do so, expectations changed and rightfully so. I myself am a micropatron. I liked his content before and expected even more of the stuff that made Kottke.org great since Jason would be blo…

  7. Sean said:

    on March 31st, 2005 at 6:11 pm

    I had been thinking that he was slacking off. Great analysis. I wrote my thoughts on my blog. Thanks for the info.

  8. Rachel C said:

    on March 31st, 2005 at 8:53 pm

    As a statistician, I can’t help but wonder at your analysis - where’s the data comparing before he went pro? PS The centre of your 95% CI: (69, 322) is not approximately 300?

  9. Elliott Bäck said:

    on March 31st, 2005 at 11:12 pm

    @Rachel, the 95% CI is an indicator that our estimation of the average post length of about 300 words per day is accurate. Which, I think, for a full time job is less than convincing. If you worked for 8 hours and only produced a half page of text, would you consider that a day’s work?

  10. Rachel C said:

    on March 31st, 2005 at 11:58 pm

    Asking the question, “what constitutes the output we’d expect of a full time blogger” is a completely different issue :)

    Avoiding that for a moment, the statistical analysis here is inappropriate and invalid I’m afraid - you’re taking time series data and applying the central limit theorem to it.

    You wrote: “a graph of post length over time shows a disturbing trend”. What disturbing trend? So he wrote one long post on the day he went pro-blogging. If you count the number of words he wrote in the three weeks prior to going pro, it’s almost half of what he wrote in the three weeks after going pro. You can use statistics to prove anything :-)

    Back to the point of “how much is enough”. I don’t know Kottke, so I have no idea what he does with his time. However, it’s really hard to argue with word counts - magazine columnists have an entire week to write their piece which may only be 1500 words. Authors can take years to write books - and no one would fault them for writing only x-words per day.

    Just some food for thought :-)

  11. Elliott Bäck said:

    on April 1st, 2005 at 2:35 am

    No, Rachel. It’s not time series data that was used for the analysis of average post length. The first graph is the time series, of course. But then I drop the time value, and look at it as just a collection of post length measurements sampled since he went public. Applying the CLT is then valid, and the CI calculation holds (correct me if I’m wrong).

  12. Rachel C said:

    on April 1st, 2005 at 4:48 am

    The central limit theorem requires a random sample of data and the values need to be independent. Since you have time series data, the values aren’t independent…

  13. Elliott Bäck said:

    on April 1st, 2005 at 5:05 am

    I have a random sample of Kottke’s posting length after going public that represents his posting length since going public. How do you determine if the events of his posting are independent or not? Anyway, I don’t think I’m dealing with time series data here, at least, that’s not what I’m interested in. If his posting has some relationship with time, who knows. All I’m wondering what his posting per diem is like.

    Can you offer a correct way to assess this?

  14. Rachel C said:

    on April 1st, 2005 at 5:24 am

    You do not have a random sample of data.

    The data is not independent - even though you may not be interested in the relationship over time, the fact of the matter is, the data was collected over time: tomorrow’s posting is not independent of yesterday’s posting and there are definite patterns to his posting.

    If your question of interest is, “how many words per day does Kottke post since going public”, then answer is by calculating the median number of words (median is more appropriate than mean since you have right skewed data). No need for a confidence interval, and of course, it’s inappropriate to calculate one using the central limit theorem as you’ve done.

    You still haven’t addressed the fact in the three weeks since he went pro he’s posted almost double what he did in the three weeks before he went pro.

    Finally the data in your plot does not show any trend whatsoever - just looking at it will tell any trained eye that.

  15. Elliott Bäck said:

    on April 1st, 2005 at 6:45 am

    Personally, I’m not interested in his activity before he went pro, rather only his output afterwards. It’s not a relative measure I care about.

  16. Rachel C said:

    on April 1st, 2005 at 6:55 am

    Fair enough. Drop the use of the confidence interval and the talk about a downward trend and then I’ll be happier :)

  17. orangeguru said:

    on April 1st, 2005 at 3:54 pm

    If Mr Kottke wants to survive on blogging he has to write for his life. I am not a Kottke fan - but he had/hhas his audience … but he seems to lack more and more ‘beef’ on his site. It is VERY hard to produce high level content on a daily basis - this is why most more popular site resert to recycling news or comment only on recent events - that’s much easier.

  18. Marco said:

    on April 1st, 2005 at 4:26 pm

    Wow that’s the most elaborate and comprehensive way of saying “There isn’t that much to read on that blog!” I’ve ever seen! :D

  19. Michael said:

    on April 1st, 2005 at 7:48 pm

    Eliott, not sure if you were answering me when asking about insightfullness of Kottke’s blog, but possibly so. And if yes, no I don’t think his posts are that much more insightful if any more insightful. I was just pointing out that you do not consider the qualityof posts at all.

  20. Nora said:

    on April 13th, 2005 at 10:08 am

    That data is *way* too right-skewed to fit a normal plot to, no? Taking the log of one or both of the variables might fix that, but still… no z-test!

  21. Elliott Bäck said:

    on April 17th, 2005 at 11:45 pm

    To Rachel, further tests reveal that there is no relationship between posting day and the length of the post (autocorrelation, lag graph, 95%), so it appears that the data is independent afterall.

    To Nora, the data is right skewed (because of non-posting days), but I have 37 points and can safely invoke the CLT–provided Rachel’s complaints that the data is not a random sample and not independent are rebutted (which I believe they are).

    I’m not a statistician, but it seems to me after more analysis that the CI is fine. Since there’s no trend, I’ve edited the post to remove that, since it is distracting from the point about “how much work gets done per day by Kottke.”

  22. Rachel C said:

    on April 18th, 2005 at 12:32 am

    Just got back from a statistics conference and I showed this to them.. try talking to a professor, you’re doing all the wrong things.

  23. Elliott Bäck said:

    on April 18th, 2005 at 12:46 am

    Sure, Rachel. I just emailed my stats professor–I’ll let you know.

  24. Elliott Bäck said:

    on April 18th, 2005 at 3:25 pm

    I was in ORIE 270 section today, and I asked TA about this problem. He said,

    Yeah, that’s fine–statisticians sometimes make a lot of noise for no reason…

    The only point of contention here is that my CI is somehow inappropriate. I asked him if with n=37 data invoking the CLT theorem was valid. He said yes. I asked him if it was valid even though the data is heavily right-skew. He said, sure–that’s the definition of the CLT. I asked him if it was appropriate given that the data is from a time-series, with the caveat that time / post length are uncorrelated. He said yes.

    You have yet to convince me that constructing a large-scale conservative CI for a sample of 37 independent data points is somehow innapropriate, and your attack that “I’m doing all the wrong things” is completely non-constructive. At most all you have are some blind claims about my data. Perhaps your acute expert knowledge in statistics is waning over the last six years…

  25. Rachel C said:

    on April 18th, 2005 at 4:57 pm

    Using the CLT on time series data is inappropriate.

  26. praetorian said:

    on April 26th, 2005 at 1:23 pm

    This is a beautifully-illustrated example of how a welfare state breeds laziness. Vote Conservative.

Your Thoughts Go Here:

Powered by WP Hashcash