Elliott C. Back: Internet & Technology

Is Blogspot really bad?

Posted in Blogging, Computers & Technology, Google, SEO, Science, Spam by Elliott Back on October 16th, 2005.

The Theory:

Blogspot is more spam than blogs. Or any one or more of the following:

However, no one ever posts numbers for their theories, so I’ll do a quick sample, just like from a statistics textbook on binary distributions.

The Facts:

I chose to browse 50 Google Blogger blogs at Blogspot, looking for any that might be splogs. I use their random blog link, which I will assume to be truly random. I keep track of four variables: last update, front-page size (wordcount), splog or not, and adsense or not. Here is the raw data in xls format:

Blogspot Splogs.xls

The rough descriptive statics, where 1 means total spam and 0 means totally clean, are:

Variable N N* Mean SE Mean StDev
splog 50 0 0.2800 0.0641 0.4536

This bar chart should give you a better idea:

Splogs to blogs

Yes, approximately 72% of Blogspot blogs are real. The other 28% are spam blogs, or splogs. And, we can give a bound, by the central limit theorem, on the accuracy of this experiment. A 95% CI for a single-sample proportions test is (0.162311, 0.424905), with a p-value 0.003.

This entry was posted on Sunday, October 16th, 2005 at 12:30 pm and is tagged with . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.

10 Responses to “Is Blogspot really bad?”

  1. Since no one at GOOGLE will admit or even tell us to what is going on, I have had to talk to other bloggers to find out why I have not been able to post for two days.

    Supposedly, blogs with few posts – and I have 300 in one year – but with a lot of links have been shut down without any prior warning by Google. Even Microsoft at its worst has never screwed over its customers this badly.

    Now as I write a blog about the media – most of my posts link to the articles I am write about, hence I have a lot of links. But Goggle has now decided to censor my views and no longer allows me to post.

    All I get is this:

    006 Please contact Blogger Support.blog/46/41/4/lacowboy/index.html

    Except – there is no way to contact anyone at blogger support…

  2. Gary Potter says:

    This really struck a pain point for Mark Cuban. His latest post rails on how Google has dropped the ball. Read about it at http://www.blogmaverick.com/entry/1234000717063627/

  3. James Kew says:

    It’s a bad assumption. Blogger announced last month that they were taking steps to filter spam blogs out of Next Blog.

  4. EngLee says:

    My guess.

    splog = spam blog.

    That should define splog. :)

  5. Ignorant says:

    Can someone give me a definition of a splog? Is the site just useless (like mine) or is there a specific characteristic that’ll identify one?

  6. Elliott Back says:

    There’s an assumption here, which is a big one, and that is that clicking the “random blog” button actually returns a “reasonably random” blog for some definition of what properties you need the randomness to hold. If this function is random, then we do indeed have a great scientific sample. n=50 is more than enough to begin doing statistical tests–the guidlines I always heard in stats class was more than 30. If the next feature is not sufficiently random, then this study is measuring something else…

  7. 90% of the blogspot.com pings that BlogsNow sees are junk.
    That’s all I can tell you. It got worse over time, and this weekend it tipped.

    It would have been easy for Blogger to stop this when it was starting a year ago. They did not.

  8. xgfriend says:

    As much as I like Blogger/Blogspot (because it plainly works), checking 50 blogs doesn’t hit me like a scientific sample making a serious point at whether splogs are less than 30% (which is kind of high, anyway). Depending on the time you’re hitting the “next blog” button, you’ll find yourself falling into splog after splog and I don’t see how that flag thing is helping.

    Blogspot is filled with splogs and blogs that have been dead for 4+ years. They should consider a serious clean-up.

  9. John says:

    A couple of thoughts…..

    1. The next / random blog button isn’t truly random – or at least wasn’t a while back… As I understand it, Blogger generate a small set of recently published blogs & then the next blog button navigates through that set randomly, which explains why some of the blogs show up more than once in quick succession.

    2. I think that the splogs are probably more visible on the weekends, because there aren’t as many “real” blogs getting published? I’d be interested to see a repeat experiment on peak-publishing Tuesday… Either way, 28% is high!!

    3. It appeared at one stage that this button had been fixed & that splogs were being excluded from these results.

    http://blogfresh.blogspot.com/2005/09/next-blog-no-splog.html

    Not so, apparently. What to do, what to do…?

Leave a Reply

Powered by WP Hashcash