Is Blogspot really bad?
The Theory:
Blogspot is more spam than blogs. Or any one or more of the following:
- Blogspot has become nothing but a crapfarm
- 90% of all blogspot com subdomains are junk
- Has Google stopped helping in the fight against splogs
- Blogspot is currently broken
- Maybe Google Inc is or was* merely naive when it comes to splogs. They allow one-click publishing…
However, no one ever posts numbers for their theories, so I’ll do a quick sample, just like from a statistics textbook on binary distributions.
The Facts:
I chose to browse 50 Google Blogger blogs at Blogspot, looking for any that might be splogs. I use their random blog link, which I will assume to be truly random. I keep track of four variables: last update, front-page size (wordcount), splog or not, and adsense or not. Here is the raw data in xls format:
The rough descriptive statics, where 1 means total spam and 0 means totally clean, are:
Variable N N* Mean SE Mean StDev
splog 50 0 0.2800 0.0641 0.4536
This bar chart should give you a better idea:

Yes, approximately 72% of Blogspot blogs are real. The other 28% are spam blogs, or splogs. And, we can give a bound, by the central limit theorem, on the accuracy of this experiment. A 95% CI for a single-sample proportions test is (0.162311, 0.424905), with a p-value 0.003.
| This entry was posted on Sunday, October 16th, 2005 at 12:30 pm and is tagged with central limit theorem, statistics textbook, google inc, binary distributions, splog, google, statics, raw data, subdomains, splogs, proportions, variables, front page, accuracy, blogs. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback. |
10 Responses to “Is Blogspot really bad?”
Leave a Reply
resignation letter template
free crossword maker
army pay chart
party clipart
area code 513
sears outlet store
how to get rid of mice
school pranks
skyline for sale
tech deck tricks
[WORDPRESS HASHCASH] The poster sent us ‘0 which is not a hashcash value.
Since no one at GOOGLE will admit or even tell us to what is going on, I have had to talk to other bloggers to find out why I have not been able to post for two days.
Supposedly, blogs with few posts – and I have 300 in one year – but with a lot of links have been shut down without any prior warning by Google. Even Microsoft at its worst has never screwed over its customers this badly.
Now as I write a blog about the media – most of my posts link to the articles I am write about, hence I have a lot of links. But Goggle has now decided to censor my views and no longer allows me to post.
All I get is this:
006 Please contact Blogger Support.blog/46/41/4/lacowboy/index.html
Except – there is no way to contact anyone at blogger support…
This really struck a pain point for Mark Cuban. His latest post rails on how Google has dropped the ball. Read about it at http://www.blogmaverick.com/entry/1234000717063627/
It’s a bad assumption. Blogger announced last month that they were taking steps to filter spam blogs out of Next Blog.
My guess.
splog = spam blog.
That should define splog.
Can someone give me a definition of a splog? Is the site just useless (like mine) or is there a specific characteristic that’ll identify one?
There’s an assumption here, which is a big one, and that is that clicking the “random blog” button actually returns a “reasonably random” blog for some definition of what properties you need the randomness to hold. If this function is random, then we do indeed have a great scientific sample. n=50 is more than enough to begin doing statistical tests–the guidlines I always heard in stats class was more than 30. If the next feature is not sufficiently random, then this study is measuring something else…
90% of the blogspot.com pings that BlogsNow sees are junk.
That’s all I can tell you. It got worse over time, and this weekend it tipped.
It would have been easy for Blogger to stop this when it was starting a year ago. They did not.
As much as I like Blogger/Blogspot (because it plainly works), checking 50 blogs doesn’t hit me like a scientific sample making a serious point at whether splogs are less than 30% (which is kind of high, anyway). Depending on the time you’re hitting the “next blog” button, you’ll find yourself falling into splog after splog and I don’t see how that flag thing is helping.
Blogspot is filled with splogs and blogs that have been dead for 4+ years. They should consider a serious clean-up.
A couple of thoughts…..
1. The next / random blog button isn’t truly random – or at least wasn’t a while back… As I understand it, Blogger generate a small set of recently published blogs & then the next blog button navigates through that set randomly, which explains why some of the blogs show up more than once in quick succession.
2. I think that the splogs are probably more visible on the weekends, because there aren’t as many “real” blogs getting published? I’d be interested to see a repeat experiment on peak-publishing Tuesday… Either way, 28% is high!!
3. It appeared at one stage that this button had been fixed & that splogs were being excluded from these results.
http://blogfresh.blogspot.com/2005/09/next-blog-no-splog.html
Not so, apparently. What to do, what to do…?