Cuil Sucks At Search (Go Google!)
I love the idea behind Cuil, the latest search engine in a long list of failures (Mahalo, Ask, Powerset) to challenge Google. As Mashable explains, they are pulling out all the stops to hit Google from multiple directions across their core search competency:
Enter Cuil, a very serious competitor, packed with ex-Googlers (Tom Costello and Anna Patterson are the backbone of Cuil, and they’ve both worked at Google), and claiming to have the largest index of websites – 120 billion – in the world.
It doesn’t end there: Cuil pulls pretty much every trick in the book. Big claims about the biggest index, privacy concerns (IP addresses of users aren’t saved, making it impossible for a third party to request it from them), semi-semantic approach (Cuil’s engine recognizes the relations between certain words on a web site, which helps it rank pages better). Hell, they even pulled the energy-saving trick: the front page of Cuil is completely black, in contrast to Google’s eye-poking whiteness.
Check out the Slashdottie thread for more discussion. I’m not interested in going there; rather I’m more concerned with how relevant the results from Cuil are, compared to Google, in a stricter context of information retrieval. After all, a search engine is about finding information.
Let’s start with a query “how to rip a dvd” in Cuil and Google:

4 of the 9 total results are spam from Ebooksbay. An additional 4 are for converting MP3s. The final result (which is quite spammy) is for ripping DVDs to a variety of formats. Score: 11%.

Google gives you 7 DVD ripping guides, and three spams site of ripping software. Essentially, you have to give it a Score: 100%, since it’s pretty much the baseline in our test. Just based on what I’ve seen so far, this will be a comparison not of relative merits, but of how much less relevant the results from Cuil are compared to Google.

Wait, what is that in the rightmost result!!!? Yes, that winsome young woman is carefully inspecting a ConcurrentHashMap! Ahm, bad image / search results correlations aside, the search listings fail to list the authority Java documentation source (Sun’s website) and instead list 2 mirrors (java 5 and 6), 4 bug reports, 3 mailing list discussions, and 2 random libraries with a similarly named class. Score: 50%.

Google nicely gives us the Sun Java page as the first result, 2 snippets of code using this class, 6 guides to using concurrent hash maps, a benchmark, one of the same random libraries as Cuil (Oswego), and a different random library (backport-util). I’d give them Score: 80% at this task.
Anyway, I’m getting tired of writing this. Cuil just doesn’t deliver fast, consistent, high-quality search results. The relevance is quite low, in spite of the interface improvements and searching / clustering / recommendation features.
Name clash: The method BLAH has the same erasure as type BLAH but does not override it
I was getting the following error in Eclipse IDE 3.1 and Java 1.5 (or 5.0 as some like to call it):
Name clash: The method removeEldestEntry(Map.Entry<K ,V>) of type LRUMap<K ,V> has the
same erasure as removeEldestEntry(Map.Entry<K , V>) of type LinkedHashMap<K , V> but does not
override it
The class in question looked like this:
class LRUMap <K , V> extends LinkedHashMap {
public LRUMap(){
super(10000, .75f, true);
}
protected boolean removeEldestEntry (Entry <K , V> eldest) {
return this.size() > 262144;
}
}
The problem is that I was extending LinkedHashMap without type parameters, not LinkedHashMap >K ,V<. Changing the code to:
class LRUMap <K , V> extends LinkedHashMap <K , V> {
public LRUMap(){
super(10000, .75f, true);
}
protected boolean removeEldestEntry (Entry <K , V> eldest) {
return this.size() > 262144;
}
}
completely fixed the problem! Type erasure is sure a pain, no? I should probably spend more time at home reading the Java Generics Tutorial.
How to hire the best
The infamous Mark Jen has posted his take on Joel’s hiring essay. Basically, Joel makes the argument that hiring the absolute best programmers is the best thing for a software company, because superb programmers are investments that more than pay for themselves. It’s basically an argument of averages–everyone can build software, but the few companies that can build great software are few and noticeable. To give a concrete example:
When everyone is making ugly square mp3 players, a stylish mp3 player with rounded edges and careful design will be king.
A coworker and I were discussing this yesterday and today. Obviously, when hiring candidates for positions, we want good ones. However, we go beyond the code of hiring the best of the best–we actually do what we say here. If there’s a candidate that you can’t respect as an equal or greater skill, a candidate who doesn’t appear to possess basic skills, or who is any way lacking is simply not good enough. A company shouldn’t hire someone that limps over the corporate minimum bar to fill a position.
Until there’s someone you find who can leap over a bar twice as high with ease, you don’t want to fill that position. So, don’t make your interviews easy. If you’re doing an interview, make it moderately challenging for someone of your level. Include a “screener” technical question that you think anyone with similar skills and general knowledge should be able to easily answer. Some good interview question choices include:
- Tell me if there are two numbers in an array that sum to x
- How do hashmaps work? How would you hash a string?
- Generate permutations of x
- Reverse a c string
- Write a tree to linked-list function
- Write an efficient recursive function to garbage collect memory
- Describe how a compiler works.
- Give an overview of DNS, TCP, filesystems, process scheduling, pipelining, or some other high-level CS topic
Once you’ve passed them through an easy coding question and another general question, you can start to interview them based on their resume, because you know that they’ve met a minimum requirement to do their job. If you’re impressed at the end, hire them. Otherwise, why bother? The negative cost of hiring someone who doesn’t impress you and your teammates is greater than the benefit of filling that vacant position.
Update:
I just noticed Shelly’s comment on this old hiring posts. It reads:
That is the worst interview question I’ve heard of. It is guaranteed to discriminate in favor of a certain type of developer, and not necessarily a good one.
No wonder you people can’t find good engineers. You don’t know how to interview worth a damn. You’re looking for code monkeys, but interviewing engineers. I had a feeling this was what was happening when I talked with someone who interviewed at Microsoft and the same thing happened. Absolutely silly questions-and yes, very biased. Your HR department has done a poor job.
Asking somebody how to do code the strstr function. I’d hire the person who looked at you like you were daft and said, “I’d use the function built into the language. Now what _job_ is it you want me to do?”
I just have to add to the conversation, and point out that asking for an interviewee to code any basic function like that is industry best practice. It’s the absolute lowest bar. Sure, if you actually can code, then these questions will seem ridiculous, but otherwise? You don’t hire a programmer who can’t write code, so you need to see if they can write code. Shelley would rather have interviews, I guess, that go like this:
Interviewer: So, you can code basic functions, do recursion, handle arrays, right?
Shelley: You bet I can! And more!
Interviewer: Fantastic–just had to check.
Shelley: Let’s move onto more interesting things…
Nope, it doesn’t work like that, because we can’t trust you to tell us the truth. Your abilities have to be assessed. Unfortunately, in another comment, Shelley goes on to say:
Any interview that resorts to having the interviewee code is a bad interview. Shows that your staff is too inexperienced to know how to interview.
She also makes a big hand-waving pseudoscientific argument about long term / short term memory with regards to coding. See, the thing is, the most basic part of this kind of job description is writing code. Sure, we create systems, do designs, model databases, and create relational object oriented structures, but then a software developer sits down and implements. Writes code. You wouldn’t believe how many people cannot write a function to reverse the elements of an array, in any language.
Here’s your challenge:
O readers, show your might. I’m going on vacation this weekend, but when I come back, I want efficient implementations of strstr, is_anagram, atoi for any base, and edit_distance. Log the time it takes you to write each one, too. Remember–these are basic interview “crawl over the bar” questions…