Tag Cloud of AOL search data

Tag Cloud of AOL Search Data

You may have heard of AOL’s recent search data snafu where they inadvertently released 35 million user search queries without properly anonymizing the data.

While they have ceased sharing the data, it is all over the Net already and people are already creating data mining tools for it.

For fun, here is the tag cloud for the top 500 search queries. Not the most interesting cloud I’ve seen — more than anything it betrays the ignorance of users who type website names and addresses into AOL’s search field instead of the browser location field.

Alternatively, you could read this as a signal of how useful the “I Feel Lucky” button is to the average user. Though I don’t actually know if AOL’s search engine has this button.

Beautiful Information

I’ve been showing TagCrowd around to friends and colleagues lately (easy user testing). It’s fun to watch people get into playing with it, seeking out ever-more interesting texts, speeches or poetry to visualize and compare. It made me realize that TagCrowd needs a photo gallery of clouds, each linking to its source text.

eecummings

I find that newcomers to the tag cloud are enamored by its gestalt typographic aesthetic more than anything else.

It’s beautiful information.

More than a few people have said they want a tag cloud print to hang on their walls as cybermodern art. Some want t-shirts with visualizations of their resumes. Roy Pea and I spoke yesterday about printing tag clouds on name tags for a September gathering of researchers.

Imagine walking around with a tag cloud dangling from your neck, meeting people and glancing down at their name tags to see the vocabulary of their interests and expertise. In a sense, you can see in that glance how to speak their language. Know to call a shoe a shoe. And know to ask about their interest in dolphin language or C++ compilers or Japanese architecture.

Tag Crowd

Roy always reminds me to ask, What’s missing from the model? For instance, what word should be in my tag cloud that isn’t? After all, it does not adequately sum up my life to run my CV through the TagCrowd shreddder — monotonous and academic as it may be. But it’s a start.

Cory Doctorow’s seven obstacles to meta-utopia guarantee we will never have perfect metadata. But we will have plenty of rough yet reliable approximations.

A tag cloud made from a CV may not be the most empirically rigorous way of assessing someone’s research interests, even a narrow band of them. But it’s a great approximation for being so quick and easy. A hand drawn sketch instead of a photograph; the tag cloud is information impressionism: what it lacks in exactitude, it makes up for with good looks.

Fuzzy information can be useful too, as Fred Turner told me today. You can learn a lot from a sketch.

Tag Scrubber is here

By far the most-frequently requested feature has been a way to prune certain irrelevant tags from tag clouds — i.e. create a tag blacklist.

Of course, everyone has a different opinion about which tags are irrelevant and, moreover, those opinions change depending on the particular text being visualized.

What was needed was a way for users (you!) to create many different blacklists and be able to choose the one applicable to your current task.

Et voila!

Introducing the Tag Scrubber. It’s still in its infancy, but currently allows you to create and edit any number of tag blacklists. As of now, all blacklists are shared among all users. If I ever implement user accounts (bureaucracy, eek!), you can have your own private collection of ’em. Until then, please refrain from divulging your deep dark secrets in alphabetic lists of irrelevant words.

~ Daniel Steinbock