Bizarre AIM Conversation or, Why does this always happen to me?
I was imed today by a 19 year old girl Beila, who decided because she wanted to, to send a stream of non-stop photographs of herself. At any moment I expected it to become an enticement for an adult website, or some other kind of spam, but she signed off and I have the impression that this girl was just bored. She wanted to see my photo so I sent her lots of pictures of Wendy and I.
Here’s the text bit of the convo… so weird. Names have been anonymized:
(10:50:17 PM) Beila: hi r u mick
(10:50:30 PM) me: oh hai lololololol
(10:50:43 PM) Beila: r u mick
(10:50:52 PM) me: im not
(10:51:07 PM) Beila: ooops sorry then who r u
(10:51:13 PM) me: elliott
(10:51:15 PM) Beila: im beila
(10:51:16 PM) me: i don’t think u know me
(10:51:26 PM) Beila: got a pic elliott ?
(10:51:43 PM) me: yes… but i have a feeling i am too old for you
(10:52:32 PM) Beila: how old r u
(10:52:33 PM) me: You look like my little sister
(10:52:40 PM) Beila: u serious
(10:52:41 PM) me: I’m 23, Beila.
(10:52:42 PM) me: Yeah.
(10:52:47 PM) Beila: im 19
[random images come my way]
(10:53:26 PM) me: who took it?
(10:53:26 PM) Beila: how old is ur sis
(10:53:30 PM) Beila: my dad
(10:53:32 PM) me: it’s a very interesting composition
(10:53:37 PM) me: my sister is 17 now
(10:54:00 PM) Beila: can i see ur pics now
(10:54:07 PM) me: Sure, let me see.
(10:54:42 PM) Beila: whos the girl
(10:54:49 PM) me: that’s my girlfriend, Yunming
(10:57:23 PM) me: So what’s all the photos for?
(10:57:34 PM) Beila: i dunno
(10:57:42 PM) me: bored huh?
(10:57:44 PM) Beila: just like sending them
(10:57:53 PM) Beila: u got more
(10:59:25 PM) me: I am running of photos I want to send you.
(10:59:31 PM) me: But I’m finding this highly amusing.
(11:01:40 PM) Beila: those were all old pics
(11:01:52 PM) Beila: this is wat i look like now
(11:06:15 PM) Beila: i g2g im tired bye
(11:06:20 PM) me: cya
I’ve had friends send me much racier pics than this:

So I sent her things like this:

I suppose in the end it was just a young teenager trying to express herself. What’s odd is that the photos she sent were not all of her–some were disconnected social dialog. Try and make sense of this, these 20 or so photos which were sent to me seconds apart:
WHAT IS THIS!??!?!?!?! I can only assume it’s a weird expression of independence, sexuality, deviance, and what I will unfortunately label “emo.” But, why am I the victim of her stream of consciousness? I’ll never know
Obnoxious Aim Spam Tonight
I don’t know why at all, but tonight I’m getting buckets of the weirdest kinds of AIM spam. The first comes from taytay20061994, who claims to be some kind of marketer wanting to give away DVD players for Sony:
(10:22:50 PM) taytay20061994: hey
(10:23:04 PM) liten fugl: Hello
(10:23:55 PM) taytay20061994: how are you?
(10:24:27 PM) liten fugl: I’m well.
(10:25:36 PM) liten fugl: Who are you?
(10:26:40 PM) taytay20061994: im rico i work for sony and i was wondering if you were interested in winning a free dvd player
(10:26:57 PM) liten fugl: I don’t need a DVD player, thanks.
(10:27:05 PM) liten fugl: I have … three right now.
(10:29:13 PM) taytay20061994: oh well what about winning a free one?
(10:30:06 PM) liten fugl: of the three that I have I only use two. Another would just take up space.
(10:33:18 PM) taytay20061994: well could i interest you in a cd player?
(10:34:42 PM) liten fugl: DVD players play CDs
(10:34:50 PM) liten fugl: I’m not sure why you want to target me for this
(10:35:05 PM) liten fugl: I’m not a great candidate for wanting something free.
(10:35:41 PM) taytay20061994: well do you know of anyone who has this service on aol instant messenger who would be interested?
(10:35:53 PM) liten fugl: No, most people I know don’t like spam.
(10:36:18 PM) taytay20061994: im sorry but i am not spam
(10:36:27 PM) taytay20061994: i am an honest seller
(10:36:47 PM) taytay20061994: and i will not take your information and use it against you
(10:37:41 PM) taytay20061994: are you still there?
The next one from, sangerlang offers to sell me female mexican companionship:
(10:46:53 PM) SaNgErlAng: are you interested in mexicans?
(10:47:10 PM) liten fugl: As much as I am interested in the Finnish
(10:48:26 PM) SaNgErlAng: excuse me can you please say that again in different words please?
(10:50:16 PM) liten fugl: I like mexicans and americans equally, but, what an odd question!
(10:52:54 PM) SaNgErlAng: do you want a mexican or not?
(10:53:22 PM) liten fugl: One person cannot possess another person as mere chattel. The notion is obnoxious!
(10:54:07 PM) SaNgErlAng: your a lezbo
(10:54:21 PM) liten fugl: I’m a guy, I’m not sure that is possible.
(10:54:42 PM) SaNgErlAng: it is your book
(10:54:59 PM) liten fugl: What’s my book? I haven’t published any books.
(10:57:39 PM) SaNgErlAng: your stupid
(10:58:05 PM) SaNgErlAng: IM SORRY YOU DONT HAVE BALLS
(10:58:22 PM) liten fugl: I think you have me confused with someone else!
I don’t know where these people get my AIM address from (although it’s all over the web, I suppose), or why they think I want what they have to sell. I am positive that these two people are not the same, since their vocabulary is significantly different. The first is a spammer, the second is … well even I can’t make out what he/she is!
Top Search Terms for 2006
As 2006 comes to a close, a number of major search providers have released their top search queries. Even though the results may be heavily doctored, they’re still valuable insights into the PPC industry.
Yahoo: Britney Spears, WWE, Shakira, Jessica Simpson, Paris Hilton, American Idol, Beyonce Knowles, Chris Brown, Pamela Anderson, Lindsay Lohan
Google: bebo , myspace, world cup, metacafe, radioblog, wikipedia, video, rebelde, mininova, wiki
Lycos: Poker, MySpace, RuneScape, Pamela Anderson, Paris Hilton, Pokemon, WWE, Golf, Spyware, Britney Spears
MSN Live: Ronaldinho, Shakira, Paris Hilton, Britney Spears, Harry Potter, Eminem, Pamela Anderson, Hilary Duff, Rebelde, Angelina Jolie
AOL: Weather, Dictionary, Dogs, American Idol, Maps, Cars, Games, Tattoo, Horoscopes, Lyrics
You can view the Top 10 Searches of 2006 spreadsheet on Google Docs, if you’d like. The data came from the following sources: Yahoo, Google, Lycos, MSN, and AOL.
Initial observation shows that searches are primarily dominated by celebrity terms, and that AOL’s searches are corrupted by their “AOL Keyword” search system. Google’s are likewise corrupted by what I suspect is manual filtering to produce tailored techie terms. Yahoo, MSN Live, and Lycos share 50% of their terms with others’ top terms, while Google and AOL come in last at 20% and 10% respectively, an indication of poor search quality.
AOL Search Data Tools List
If you don’t know about AOL Gate, you’ve been gone a long time. Well, the good news is that a number of searchable AOL Data databases have been released, each with its own set of unique features. This post attempts to categorize them all!
Databases:
AOL Psych is a collaborative project to tag the AOL user data. Sometimes slow, it’s the most elaborate AOL search device I’ve seen so far–and by far the prettiest:
AOL Search Logs is a basic, but full featured index to AOL data, with basic features like comments. It doesn’t allow easy browsing or searching, but it has the right basic features to make it (barely) usable.
Splunk’d mirrors exist of the AOL data–however powerful they may be, the interface has a steep learning curve for a non-splunk user. However, Splunk may be the most powerful, as it allows wildcarding and other advanced search features:
Other less useful tools:
SEO Tools:
The AOL Site Incoming Keyword Tool takes a URL and tells you how people got there–quite useful for research purposes:
AOL Gate: Search Query Data Scandal
Techcrunch notes that AOL has released a file containing 20,000,000 queries from “anonymized” users. However, this is a problem because anything those users typed into AOL search–social security numbers, names, drug deals, etc can be cross-correlated to expose their identities. Imagine a politician ego-searching then browsing asian pornography? The scandal would just be beginning.

AOL smartly took down the download link, but once released on the web, it will always be on the web. To that end, we’re hosting the data here on our bandwidth-limited downloads platform: AOL-data.tgz. If you get in, you should get a decently fast speed.
According to Adam D’Angelo, the reason AOL published the data was for recognition in the search-engine research arena:
This was not a leak - it was intentional. In their desperation to gain recognition from the research community, AOL decided they would compromise their integrity to provide a data set that might become often-cited in research papers: “Please reference the following publication when using this collection: G. Pass, A. Chowdhury, C. Torgeson, ‘A Picture of Search’ The First International Conference on Scalable Information Systems, Hong Kong, June, 2006.” is the message before the download.
Here’s a breakdown of the core facts:
- 20,000,000 queries from 650,000 users in 2GB uncompressed tab-delimited files
- Uncensored queries for three months of AOL search service, spring 2006
- Essentially public domain
- Contains dangerous private information
Update
The data is rife with all kinds of personally identifiable data. For example, a quick grep for credit-card patterns produces the following:
grep -i -e “[0-9]\{4\}-[0-9]\{4\}-[0-9]\{4\}-[0-9]\{4\}” *.txt
- 9006-0512-xxxx-xxx
- 1550-0905-xxxx-xxxx
Looking for Social Security Numbers (SSN) turns up this HUGE amount of data:
grep -i -e “\b[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}\b” *.txt
- kristy nicole vega hammond la. social secruity number 437-67-xxxx birth date 03 08 xx drivers license number la. 00765xxxx address 41178 rene dr. hammond la.
- pamela button 079-60-xxxx
- thomas j finney socsec 370-40-xxxx
- 419-94-xxxx thomas black
- 458-87-xxxx seguro social
- social security number 545-29-xxxx
- ssn 436-47-xxxx
I’ve censored the personal information, but there are about 200 entries of social security numbers in the test data. Searching for things that look email addresses ([a-zA-Z0-9_\-]*@[a-zA-Z0-9_\-]*\.) turns up another 60 or so.
Update 2:
If you want to get this data into a more usable form, say MySQL, try this (note that we’re not going to bother storing duplicate queries, but you might want to):
mysql> CREATE TABLE aoldata (anonid int unsigned not null, query varchar(255), querytime datetime, itemrank int unsigned, clickurl varchar(255), PRIMARY KEY(anonid, query))
Then you just need to import it, as appropriate:
LOAD DATA LOCAL INFILE ‘user-ct-test-collection-01.txt’
INTO TABLE aoldata
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
(anonid, query, querytime, itemrank, clickurl);
Other Blogs
Paul notes that the AOL data is really Google data, since AOL search is rebranded Google. Zoli has the post that started it all.

























