In his Weblogsinc blog, Brad Hill writes:
There’s no question that Wikipedia is doing well in Google. But any magic formula used to explain its success must also be applied to Keio University and the Public Broadcasting Service. Like Einstein trying to crack the unified field theory, Shah has more work to do.
Basically, Socialtext looked at this query (*+*) and determined that Wikipedia pages ranked highest of 11 billion pages. Therefore, they must have the highest pagerank, right? Well, not so. The query **** returns 11.8 billion results, with PBS pages squarely at the top. So, the question is unanswered, but even worse, no one has proposed a method for testing rankings!
So, what someone with slightly more free time than me ought to do is take a random sampling of n-tuples from a dictionary and then compile a list of wikipedia rankings and densities for each term, say, for the first hundred results or so. 100 random tuples should be enough, and the experiment can be repeated a few times and averaged. According to the law of large numbers, we’ll arrive at a decent mean Wikipedia ranking for improper nouns. For things like celebrity names and other proper nouns, the results would of course be different, but is that a question we want to consider before we have any kind of framework?
|This entry was posted on Tuesday, October 4th, 2005 at 1:27 pm and is tagged with public broadcasting service, keio university, unified field theory, proper nouns, law of large numbers, random sampling, celebrity names, magic formula, tuples, google, densities, pagerank, wikipedia, free time, pbs, einstein, dictionary. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.|